small
query, turning off optimizations such as parallel execution and native code generation. The
overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
allows Impala to complete them more quickly, keeping YARN resources, admission control slots, and so on
available for data-intensive queries.
Type: numeric
Default: 100
Usage notes: Typically, you increase the default value to make this optimization apply to more queries.
If incorrect or corrupted table and column statistics cause Impala to apply this optimization
incorrectly to queries that actually involve substantial work, you might see the queries being slower as a
result of remote reads. In that case, recompute statistics with the
This setting applies to query fragments where the amount of data to scan can be accurately determined, either
through table and column statistics, or by the presence of a
In
For a query that is determined to be small
, all work is performed on the coordinator node. This might
result in some I/O being performed by remote reads. The savings from not distributing the query work and not
generating native code are expected to outweigh any overhead from the remote reads.
A common use case is to query just a few rows from a table to inspect typical data values. In this example, Impala does not parallelize the query or perform native code generation because the result set is guaranteed to be smaller than the threshold value from this query option: