Files
impala/docs/topics/impala_stddev.xml
John Russell 8377b9949c Global search/replace: audience="Cloudera" -> audience="hidden".
For this change to land in master, the audience="hidden" code review
needs to be completed first. Otherwise, the doc build would still work
but the audience="hidden" content would be visible rather than hidden as
desired.

Some work happening in parallel might introduce additional instances of
audience="Cloudera". I suggest addressing those in a followup CR so this
global change can land quickly.

Since the changes apply across so many different files, but are so
narrow in scope, I suggest that the way to validate (check that no
extraneous changes were introduced accidentally) is to diff just the
changed lines:

git diff -U0 HEAD^ HEAD

In patch set 2, I updated other topics marked audience="Cloudera"
by CRs that were pushed in the meantime.

Change-Id: Ic93d89da77e1f51bbf548a522d98d0c4e2fb31c8
Reviewed-on: http://gerrit.cloudera.org:8080/5613
Reviewed-by: John Russell <jrussell@cloudera.com>
Tested-by: Impala Public Jenkins
2017-01-18 19:31:57 +00:00

137 lines
5.4 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.4" id="stddev">
<title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title>
<titlealts audience="PDF"><navtitle>STDDEV, STDDEV_SAMP, STDDEV_POP</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Impala Functions"/>
<data name="Category" value="Aggregate Functions"/>
<data name="Category" value="Querying"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
<indexterm audience="hidden">stddev() function</indexterm>
<indexterm audience="hidden">stddev_samp() function</indexterm>
<indexterm audience="hidden">stddev_pop() function</indexterm>
An aggregate function that
<xref href="http://en.wikipedia.org/wiki/Standard_deviation" scope="external" format="html">standard
deviation</xref> of a set of numbers.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<codeblock>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <varname>expression</varname>)</codeblock>
<p>
This function works with any numeric data type.
</p>
<p conref="../shared/impala_common.xml#common/former_odd_return_type_string"/>
<p>
This function is typically used in mathematical formulas related to probability distributions.
</p>
<p>
The <codeph>STDDEV_POP()</codeph> and <codeph>STDDEV_SAMP()</codeph> functions compute the population
standard deviation and sample standard deviation, respectively, of the input values.
(<codeph>STDDEV()</codeph> is an alias for <codeph>STDDEV_SAMP()</codeph>.) Both functions evaluate all input
rows matched by the query. The difference is that <codeph>STDDEV_SAMP()</codeph> is scaled by
<codeph>1/(N-1)</codeph> while <codeph>STDDEV_POP()</codeph> is scaled by <codeph>1/N</codeph>.
</p>
<p>
If no input rows match the query, the result of any of these functions is <codeph>NULL</codeph>. If a single
input row matches the query, the result of any of these functions is <codeph>"0.0"</codeph>.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
This example demonstrates how <codeph>STDDEV()</codeph> and <codeph>STDDEV_SAMP()</codeph> return the same
result, while <codeph>STDDEV_POP()</codeph> uses a slightly different calculation to reflect that the input
data is considered part of a larger <q>population</q>.
</p>
<codeblock>[localhost:21000] &gt; select stddev(score) from test_scores;
+---------------+
| stddev(score) |
+---------------+
| 28.5 |
+---------------+
[localhost:21000] &gt; select stddev_samp(score) from test_scores;
+--------------------+
| stddev_samp(score) |
+--------------------+
| 28.5 |
+--------------------+
[localhost:21000] &gt; select stddev_pop(score) from test_scores;
+-------------------+
| stddev_pop(score) |
+-------------------+
| 28.4858 |
+-------------------+
</codeblock>
<p>
This example demonstrates that, because the return value of these aggregate functions is a
<codeph>STRING</codeph>, you must currently convert the result with <codeph>CAST</codeph>.
</p>
<codeblock>[localhost:21000] &gt; create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] &gt; desc score_stats;
+--------------------+--------------+---------+
| name | type | comment |
+--------------------+--------------+---------+
| standard_deviation | decimal(7,4) | |
| variance | decimal(7,4) | |
+--------------------+--------------+---------+
</codeblock>
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
<p conref="../shared/impala_common.xml#common/analytic_not_allowed_caveat"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
The <codeph>STDDEV()</codeph>, <codeph>STDDEV_POP()</codeph>, and <codeph>STDDEV_SAMP()</codeph> functions
compute the standard deviation (square root of the variance) based on the results of
<codeph>VARIANCE()</codeph>, <codeph>VARIANCE_POP()</codeph>, and <codeph>VARIANCE_SAMP()</codeph>
respectively. See <xref href="impala_variance.xml#variance"/> for details about the variance property.
</p>
</conbody>
</concept>