mirror of
https://github.com/apache/impala.git
synced 2026-01-27 06:10:53 -05:00
2db16efda8e952604632fcd7a2718bda99edcdae
The plan generation is heuristic. A SubplanNode is placed as low as possible in the plan tree - as soon as its required parent tuple ids are materialized. This approach is simple to understand and implement, but not always optimal. For example, it may be better to place a Subplan after a selective join, but today we will place it below the join if it is correct to do so. For such scenarios, the straight_join hint can be used to manually tune the join and Subplan order. If straight_join is used, correlated and child table refs are placed into the same SubplanNode if they are adjacent in the FROM clause. Change-Id: I53e4623eb58f8b7ad3d02be15ad8726769f6f8c9 Reviewed-on: http://gerrit.cloudera.org:8080/401 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins
Welcome to Impala
Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.
Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:
- Best of breed performance and scalability.
- Support for data stored in HDFS, Apache HBase and Amazon S3.
- Wide analytic SQL support, including window functions and subqueries.
- On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
- Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
- Apache-licensed, 100% open source.
More about Impala
To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.
If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.
Languages
C++
49.2%
Java
30.5%
Python
14.5%
JavaScript
1.3%
C
1.2%
Other
3.2%