impala

mirror of https://github.com/apache/impala.git synced 2026-02-02 15:00:38 -05:00

Author	SHA1	Message	Date
Aman Sinha	e644c99724	IMPALA-10723: Treat materialized view as a table instead of a view The existing behavior is that materialized views are treated as views and therefore expanded similar to a view when one queries the MV directly (SELECT * FROM materialized_view). This is incorrect since an MV is a regular table with physical properties such as partitioning, clustering etc. and should be treated as such even though it has a view definition associated with it. This patch focuses on the use case where MVs are created as HDFS tables and makes the MVs a derived class of HdfsTable, therefore making it a Table object. It adds support for collecting and displaying statistics on materialized views and these statistics could be leveraged by an external frontend that supports MV based query rewrites (note that such a rewrite is not supported by Impala with or without this patch). Note that we are not introducing new syntax for MVs since DDL, DML operations on MVs are only supported through Hive. Directly querying a MV is permitted but inserts into MVs is not since MVs are supposed to be only modified through an external refresh when the source tables have modifications. If the source tables associated with a materialized view have column masking or row-filtering Ranger policies, querying the MV will throw an error. This behavior is consistent with that of Hive. Testing: - Added transactional tables for alltypes, jointbl and used them as source tables to create materialized view. - Added tests for compute stats, drop stats, show stats and simple select query on a materialized view. - Added test for select on a materialized view when the source table has a column mask. - Modified analyzer tests related to alter, insert, drop of materialized view. Change-Id: If3108996124c6544a97fb0c34b6aff5e324a6cff Reviewed-on: http://gerrit.cloudera.org:8080/17595 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>	2022-04-14 11:56:20 +00:00
Taras Bobrovytsky	7faaa65996	Added order by query tests - Added static order by tests to test_queries.py and QueryTest/sort.test - test_order_by.py also contains tests with static queries that are run with multiple memory limits. - Added stress, scratch disk and failpoints tests - Incorporated Srinath's change that copied all order by with limit tests into the top-n.test file Extra time required: Serial: scratch disk: 42 seconds test queries sort : 77 seconds test sort: 56 seconds sort stress: 142 seconds TOTAL: 5 min 17 seconds Parallel(8 threads): scratch disk: 40 seconds test queries sort: 42 seconds test sort: 49 seconds sort stress: 93 seconds TOTAL: 3 min 44 sec Change-Id: Ic5716bcfabb5bb3053c6b9cebc9bfbbb9dc64a7c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2820 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/3205	2014-06-20 13:35:10 -07:00
Nong Li	6e691f9500	IMPALA-1010: Remove Close() of build side in blocking join node. This optimization is generally not safe since the probe side is still streaming. The join node could acquire all of the data from the child into its own pool but then there's no real point in doing this (doesn't lead to lower memory footprint and just makes the mem accounting harder to reason about). This is exposed in busy plans. Change-Id: I37b0f6507dc67c79e5ebe8b9242ec86f28ddad41 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2747 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-05-30 11:50:50 -07:00
Alex Behm	b252921363	IMPALA-994: Handle incorrect column metadata in views created by Hive. Change-Id: I3fba08d191c479f37371ce50fd07b8476a73eba2 Reviewed-on: http://gerrit.ent.cloudera.com:8080/2613 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2618 Reviewed-by: Alex Behm <alex.behm@cloudera.com>	2014-05-19 20:17:23 -07:00
Aaron Davidson	cafb7b72f8	External sorting This is an experimental implementation of external sorting. This patch includes the following additions: (1) creation and implementation of the Sorter interface, which can sort Impala Tuples. (2) normalization of Tuples to allow memcmp-able sorting. (3) a testing framework for the Sorter, (4) a benchmark to compare the current state of the Sorter with other sorts, (5) an implementation of a Vector which can store data whose size is only known at runtime, (6) a sorting algorithm (basically a dumbed down STL sort) which can operate over such a vector, (7) implementation of a simple in-memory Merger, and (8) logic to stream blocks of memory in and out of memory for the actual external merging. I have a local branch for experimental optimizations and benchmarking -- this should be considered a "basic", working sort. The following optimizations have been implemented: (i) Optionally extracting keys instead of writing them in place. (ii) Optionally opportunistically parallelize run building (sorting & prepare for output). (iii) Maximize disk IO and minimize buffer recycling by writing buffers out, but also keeping them in memory until right when they're needed. (iv) Prepare auxililary data backwards so the buffers can be released as we go, and still go out in an order which preserves the first buffers of the run. (v) Always merge maximum number of runs at a time, taking from the next merge level if available. Change-Id: I1d7304d54d73152da929b1efffc1e851e5fb8fd4 Reviewed-on: http://gerrit.ent.cloudera.com:8080/126 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: Aaron Davidson <aaron.davidson@cloudera.com>	2014-01-08 10:52:27 -08:00
Alex Behm	8ad15fabcf	IMPALA-372: Added CREATE/DROP/ALTER VIEW.	2014-01-08 10:51:35 -08:00

6 Commits