Files
impala/docs/topics/impala_min.xml
John Russell 8377b9949c Global search/replace: audience="Cloudera" -> audience="hidden".
For this change to land in master, the audience="hidden" code review
needs to be completed first. Otherwise, the doc build would still work
but the audience="hidden" content would be visible rather than hidden as
desired.

Some work happening in parallel might introduce additional instances of
audience="Cloudera". I suggest addressing those in a followup CR so this
global change can land quickly.

Since the changes apply across so many different files, but are so
narrow in scope, I suggest that the way to validate (check that no
extraneous changes were introduced accidentally) is to diff just the
changed lines:

git diff -U0 HEAD^ HEAD

In patch set 2, I updated other topics marked audience="Cloudera"
by CRs that were pushed in the meantime.

Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8
Reviewed-on: http://gerrit.cloudera.org:8080/5613
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-18 19:31:57 +00:00

216 lines
9.0 KiB
XML

<?xml version="1.0" encoding="UTF-8"?><!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="min">
<title>MIN Function</title>
<titlealts audience="PDF"><navtitle>MIN</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Impala Functions"/>
<data name="Category" value="Analytic Functions"/>
<data name="Category" value="Aggregate Functions"/>
<data name="Category" value="Querying"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
<indexterm audience="hidden">min() function</indexterm>
An aggregate function that returns the minimum value from a set of numbers. Opposite of the
<codeph>MAX</codeph> function. Its single argument can be numeric column, or the numeric result of a function
or expression applied to the column value. Rows with a <codeph>NULL</codeph> value for the specified column
are ignored. If the table is empty, or all the values supplied to <codeph>MIN</codeph> are
<codeph>NULL</codeph>, <codeph>MIN</codeph> returns <codeph>NULL</codeph>.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<codeblock>MIN([DISTINCT | ALL] <varname>expression</varname>) [OVER (<varname>analytic_clause</varname>)]</codeblock>
<p>
When the query contains a <codeph>GROUP BY</codeph> clause, returns one value for each combination of
grouping values.
</p>
<p conref="../shared/impala_common.xml#common/restrictions_sliding_window"/>
<p conref="../shared/impala_common.xml#common/return_type_same_except_string"/>
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p conref="../shared/impala_common.xml#common/partition_key_optimization"/>
<p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
<p conref="../shared/impala_common.xml#common/complex_types_aggregation_explanation"/>
<p conref="../shared/impala_common.xml#common/complex_types_aggregation_example"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<codeblock>-- Find the smallest value for this column in the table.
select min(c1) from t1;
-- Find the smallest value for this column from a subset of the table.
select min(c1) from t1 where month = 'January' and year = '2013';
-- Find the smallest value from a set of numeric function results.
select min(length(s)) from t1;
-- Can also be used in combination with DISTINCT and/or GROUP BY.
-- Return more than one result.
select month, year, min(purchase_price) from store_stats group by month, year;
-- Filter the input to eliminate duplicates before performing the calculation.
select min(distinct x) from t1;
</codeblock>
<p rev="2.0.0">
The following examples show how to use <codeph>MIN()</codeph> in an analytic context. They use a table
containing integers from 1 to 10. Notice how the <codeph>MIN()</codeph> is reported for each input value, as
opposed to the <codeph>GROUP BY</codeph> clause which condenses the result set.
<codeblock>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
+----+----------+-----+
| x | property | min |
+----+----------+-----+
| 2 | even | 2 |
| 4 | even | 2 |
| 6 | even | 2 |
| 8 | even | 2 |
| 10 | even | 2 |
| 1 | odd | 1 |
| 3 | odd | 1 |
| 5 | odd | 1 |
| 7 | odd | 1 |
| 9 | odd | 1 |
+----+----------+-----+
</codeblock>
Adding an <codeph>ORDER BY</codeph> clause lets you experiment with results that are cumulative or apply to a moving
set of rows (the <q>window</q>). The following examples use <codeph>MIN()</codeph> in an analytic context
(that is, with an <codeph>OVER()</codeph> clause) to display the smallest value of <codeph>X</codeph>
encountered up to each row in the result set. The examples use two columns in the <codeph>ORDER BY</codeph>
clause to produce a sequence of values that rises and falls, to illustrate how the <codeph>MIN()</codeph>
result only decreases or stays the same throughout each partition within the result set.
The basic <codeph>ORDER BY x</codeph> clause implicitly
activates a window clause of <codeph>RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>,
which is effectively the same as <codeph>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</codeph>,
therefore all of these examples produce the same results:
<codeblock>select x, property, min(x) <b>over (order by property, x desc)</b> as 'minimum to this point'
from int_t where property in ('prime','square');
+---+----------+-----------------------+
| x | property | minimum to this point |
+---+----------+-----------------------+
| 7 | prime | 7 |
| 5 | prime | 5 |
| 3 | prime | 3 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 2 |
| 1 | square | 1 |
+---+----------+-----------------------+
select x, property,
min(x) over
(
<b>order by property, x desc</b>
<b>range between unbounded preceding and current row</b>
) as 'minimum to this point'
from int_t where property in ('prime','square');
+---+----------+-----------------------+
| x | property | minimum to this point |
+---+----------+-----------------------+
| 7 | prime | 7 |
| 5 | prime | 5 |
| 3 | prime | 3 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 2 |
| 1 | square | 1 |
+---+----------+-----------------------+
select x, property,
min(x) over
(
<b>order by property, x desc</b>
<b>rows between unbounded preceding and current row</b>
) as 'minimum to this point'
from int_t where property in ('prime','square');
+---+----------+-----------------------+
| x | property | minimum to this point |
+---+----------+-----------------------+
| 7 | prime | 7 |
| 5 | prime | 5 |
| 3 | prime | 3 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 2 |
| 1 | square | 1 |
+---+----------+-----------------------+
</codeblock>
The following examples show how to construct a moving window, with a running minimum taking into account all rows before
and 1 row after the current row.
Because of a restriction in the Impala <codeph>RANGE</codeph> syntax, this type of
moving window is possible with the <codeph>ROWS BETWEEN</codeph> clause but not the <codeph>RANGE BETWEEN</codeph> clause.
Because of an extra Impala restriction on the <codeph>MAX()</codeph> and <codeph>MIN()</codeph> functions in an
analytic context, the lower bound must be <codeph>UNBOUNDED PRECEDING</codeph>.
<codeblock>select x, property,
min(x) over
(
<b>order by property, x desc</b>
<b>rows between unbounded preceding and 1 following</b>
) as 'local minimum'
from int_t where property in ('prime','square');
+---+----------+---------------+
| x | property | local minimum |
+---+----------+---------------+
| 7 | prime | 5 |
| 5 | prime | 3 |
| 3 | prime | 2 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 1 |
| 1 | square | 1 |
+---+----------+---------------+
-- Doesn't work because of syntax restriction on RANGE clause.
select x, property,
min(x) over
(
<b>order by property, x desc</b>
<b>range between unbounded preceding and 1 following</b>
) as 'local minimum'
from int_t where property in ('prime','square');
ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
</codeblock>
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_analytic_functions.xml#analytic_functions"/>, <xref href="impala_max.xml#max"/>,
<xref href="impala_avg.xml#avg"/>
</p>
</conbody>
</concept>