This repository has been archived on 2025-12-25. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
tcommon-studio-se/org.talend.core.runtime/resources/PigExpressionBuilder.xml
2013-07-22 05:16:36 +00:00

1251 lines
35 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?xml version="1.0" encoding="UTF-8"?>
<Functions name="PIG" language="pig" version="0.10.0">
<!-- Pig Eval Functions start -->
<Category name="Pig Eval Functions">
<Function name="AVG">
<Desc>Computes the average of the numeric values in a single-column bag.</Desc>
<Syntax>AVG(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
Any expression whose result is a bag. The elements of the bag
should be data type int, long, float, or double.
</Desc>
</Term>
</Terms>
<Usage>
Use the AVG function to compute the average of the numeric
values in a single-column bag. AVG requires a preceding GROUP ALL
statement for global averages and a GROUP BY statement for group
averages.
The AVG function now ignores NULL values.
</Usage>
<Example>
In this example the average GPA for each student is computed
(see the GROUP operator for information about the field names in
relation B).
A = LOAD 'student.txt' AS (name:chararray,term:chararray,gpa:float);
DUMP A;
(John,fl,3.9F)
(John,wt,3.7F)
(John,sp,4.0F)
(John,sm,3.8F)
(Mary,fl,3.8F)
(Mary,wt,3.9F)
(Mary,sp,4.0F)
(Mary,sm,4.0F)
B = GROUP A BY name;
DUMP B;
(John,{(John,fl,3.9F),(John,wt,3.7F),(John,sp,4.0F),(John,sm,3.8F)})
(Mary,{(Mary,fl,3.8F),(Mary,wt,3.9F),(Mary,sp,4.0F),(Mary,sm,4.0F)})
C = FOREACH B GENERATE A.name, AVG(A.gpa);
DUMP C;
({(John),(John),(John),(John)},3.850000023841858)
({(Mary),(Mary),(Mary),(Mary)},3.925000011920929)
</Example>
<Types>
<Type expression="int" result="long" />
<Type expression="long" result="long" />
<Type expression="float" result="double" />
<Type expression="double" result="double" />
<Type expression="chararray" result="" desc="error" />
<Type expression="bytearray" result="double" desc="cast as double" />
</Types>
</Function>
<Function name="CONCAT">
<Desc>Concatenates two expressions of identical type.</Desc>
<Syntax>CONCAT (expression, expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>Any expression.</Desc>
</Term>
</Terms>
<Usage>
Use the CONCAT function to concatenate two expressions. The result values of the two expressions must have identical types.
</Usage>
<Example>
In this example fields f2 and f3 are concatenated.
A = LOAD 'data' as (f1:chararray, f2:chararray, f3:chararray);
DUMP A;
(apache,open,source)
(hadoop,map,reduce)
(pig,pig,latin)
X = FOREACH A GENERATE CONCAT(f2,f3);
DUMP X;
(opensource)
(mapreduce)
(piglatin)
</Example>
</Function>
<Function name="COUNT">
<Desc>Computes the number of elements in a bag.</Desc>
<Syntax>COUNT(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data type bag.
</Desc>
</Term>
</Terms>
<Usage>
Use the COUNT function to compute the number of elements in a bag. COUNT requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.
The COUNT function follows syntax semantics and ignores nulls. What this means is that a tuple in the bag will not be counted if the FIRST FIELD in this tuple is NULL. If you want to include NULL values in the count computation, use COUNT_STAR.
Note: You cannot use the tuple designator (*) with COUNT; that is, COUNT(*) will not work.
</Usage>
<Example>
In this example the tuples in the bag are counted (see the GROUP operator for information about the field names in relation B).
A = LOAD 'data' AS (f1:int,f2:int,f3:int);
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
B = GROUP A BY f1;
DUMP B;
(1,{(1,2,3)})
(4,{(4,2,1),(4,3,3)})
(7,{(7,2,5)})
(8,{(8,3,4),(8,4,3)})
X = FOREACH B GENERATE COUNT(A);
DUMP X;
(1L)
(2L)
(1L)
(2L)
</Example>
<Types>
<Type expression="int" result="long" />
<Type expression="long" result="long" />
<Type expression="float" result="long" />
<Type expression="double" result="long" />
<Type expression="chararray" result="long" />
<Type expression="bytearray" result="long" />
</Types>
</Function>
<Function name="COUNT_STAR">
<Desc>Computes the number of elements in a bag.</Desc>
<Syntax>COUNT_STAR(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data type bag.
</Desc>
</Term>
</Terms>
<Usage>
Use the COUNT_STAR function to compute the number of elements in a bag. COUNT_STAR requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.
COUNT_STAR includes NULL values in the count computation (unlike COUNT, which ignores NULL values).
</Usage>
<Example>
In this example COUNT_STAR is used the count the tuples in a bag.
X = FOREACH B GENERATE COUNT_STAR(A);
</Example>
</Function>
<Function name="DIFF">
<Desc>Compares two fields in a tuple.</Desc>
<Syntax>DIFF (expression, expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with any data type.
</Desc>
</Term>
</Terms>
<Usage>
The DIFF function takes two bags as arguments and compares them. Any tuples that are in one bag but not the other are returned in a bag. If the bags match, an empty bag is returned. If the fields are not bags then they will be wrapped in tuples and returned in a bag if they do not match, or an empty bag will be returned if the two records match. The implementation assumes that both bags being passed to the DIFF function will fit entirely into memory simultaneously. If this is not the case the UDF will still function but it will be VERY slow.
</Usage>
<Example>
In this example DIFF compares the tuples in two bags.
A = LOAD 'bag_data' AS (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
DUMP A;
({(8,9),(0,1)},{(8,9),(1,1)})
({(2,3),(4,5)},{(2,3),(4,5)})
({(6,7),(3,7)},{(2,2),(3,7)})
DESCRIBE A;
a: {B1: {T1: (t1: int,t2: int)},B2: {T2: (f1: int,f2: int)}}
X = FOREACH A DIFF(B1,B2);
grunt> dump x;
({(0,1),(1,1)})
({})
({(6,7),(2,2)})
</Example>
</Function>
<Function name="IsEmpty">
<Desc>Checks if a bag or map is empty.</Desc>
<Syntax>IsEmpty(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with any data type.
</Desc>
</Term>
</Terms>
<Usage>
The IsEmpty function checks if a bag or map is empty (has no data). The function can be used to filter data.
</Usage>
<Example>
In this example all students with an SSN but no name are located.
SSN = load 'ssn.txt' using PigStorage() as (ssn:long);
SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray);
/* do a left outer join of SSN with SSN_Name */
X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn;
/* only keep those ssn's for which there is no name */
Y = filter X by IsEmpty(SSN_NAME);
</Example>
</Function>
<Function name="MAX">
<Desc>Computes the maximum of the numeric values or chararrays in a single-column bag. MAX requires a preceding GROUP ALL statement for global maximums and a GROUP BY statement for group maximums.</Desc>
<Syntax>MAX(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data types int, long, float, double, or chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the MAX function to compute the maximum of the numeric values or chararrays in a single-column bag.
</Usage>
<Example>
In this example the maximum GPA for all terms is computed for each student (see the GROUP operator for information about the field names in relation B).
A = LOAD 'student' AS (name:chararray, session:chararray, gpa:float);
DUMP A;
(John,fl,3.9F)
(John,wt,3.7F)
(John,sp,4.0F)
(John,sm,3.8F)
(Mary,fl,3.8F)
(Mary,wt,3.9F)
(Mary,sp,4.0F)
(Mary,sm,4.0F)
B = GROUP A BY name;
DUMP B;
(John,{(John,fl,3.9F),(John,wt,3.7F),(John,sp,4.0F),(John,sm,3.8F)})
(Mary,{(Mary,fl,3.8F),(Mary,wt,3.9F),(Mary,sp,4.0F),(Mary,sm,4.0F)})
X = FOREACH B GENERATE group, MAX(A.gpa);
DUMP X;
(John,4.0F)
(Mary,4.0F)
</Example>
<Types>
<Type expression="int" result="int" />
<Type expression="long" result="long" />
<Type expression="float" result="float" />
<Type expression="double" result="double" />
<Type expression="chararray" result="chararray" />
<Type expression="bytearray" result="double" desc="cast as double" />
</Types>
</Function>
<Function name="MIN">
<Desc>Computes the minimum of the numeric values or chararrays in a single-column bag. MIN requires a preceding GROUP… ALL statement for global minimums and a GROUP … BY statement for group minimums.</Desc>
<Syntax>MIN(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data types int, long, float, double, or chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the MIN function to compute the minimum of a set of numeric values or chararrays in a single-column bag.
</Usage>
<Example>
In this example the minimum GPA for all terms is computed for each student (see the GROUP operator for information about the field names in relation B).
A = LOAD 'student' AS (name:chararray, session:chararray, gpa:float);
DUMP A;
(John,fl,3.9F)
(John,wt,3.7F)
(John,sp,4.0F)
(John,sm,3.8F)
(Mary,fl,3.8F)
(Mary,wt,3.9F)
(Mary,sp,4.0F)
(Mary,sm,4.0F)
B = GROUP A BY name;
DUMP B;
(John,{(John,fl,3.9F),(John,wt,3.7F),(John,sp,4.0F),(John,sm,3.8F)})
(Mary,{(Mary,fl,3.8F),(Mary,wt,3.9F),(Mary,sp,4.0F),(Mary,sm,4.0F)})
X = FOREACH B GENERATE group, MIN(A.gpa);
DUMP X;
(John,3.7F)
(Mary,3.8F)
</Example>
<Types>
<Type expression="int" result="int" />
<Type expression="long" result="long" />
<Type expression="float" result="float" />
<Type expression="double" result="double" />
<Type expression="chararray" result="chararray" />
<Type expression="bytearray" result="double" desc="cast as double" />
</Types>
</Function>
<Function name="SIZE">
<Desc>Computes the number of elements based on any Pig data type.</Desc>
<Syntax>SIZE(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with any data type.
</Desc>
</Term>
</Terms>
<Usage>
Use the SIZE function to compute the number of elements based on the data type (see the Types Tables below). SIZE includes NULL values in the size computation. SIZE is not algebraic.
</Usage>
<Example>
In this example the number of characters in the first field is computed.
A = LOAD 'data' as (f1:chararray, f2:chararray, f3:chararray);
(apache,open,source)
(hadoop,map,reduce)
(pig,pig,latin)
X = FOREACH A GENERATE SIZE(f1);
DUMP X;
(6L)
(6L)
(3L)
</Example>
<Types>
<Type expression="int" result="1" desc="returns 1"/>
<Type expression="long" result="1" desc="returns 1"/>
<Type expression="float" result="1" desc="returns 1"/>
<Type expression="double" result="1" desc="returns 1"/>
<Type expression="chararray" result="" desc="returns number of characters in the array" />
<Type expression="bytearray" result="" desc="returns number of bytes in the array" />
<Type expression="tuple" result="" desc="returns number of fields in the tuple" />
<Type expression="bag" result="" desc="returns number of tuples in bag" />
<Type expression="map" result="" desc="returns number of key/value pairs in map" />
</Types>
</Function>
<Function name="SUM">
<Desc>Computes the sum of the numeric values in a single-column bag. SUM requires a preceding GROUP ALL statement for global sums and a GROUP BY statement for group sums.</Desc>
<Syntax>SUM(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data types int, long, float, double, or bytearray cast as double.
</Desc>
</Term>
</Terms>
<Usage>
Use the SUM function to compute the sum of a set of numeric values in a single-column bag.
</Usage>
<Example>
In this example the number of pets is computed. (see the GROUP operator for information about the field names in relation B).
A = LOAD 'data' AS (owner:chararray, pet_type:chararray, pet_num:int);
DUMP A;
(Alice,turtle,1)
(Alice,goldfish,5)
(Alice,cat,2)
(Bob,dog,2)
(Bob,cat,2)
B = GROUP A BY owner;
DUMP B;
(Alice,{(Alice,turtle,1),(Alice,goldfish,5),(Alice,cat,2)})
(Bob,{(Bob,dog,2),(Bob,cat,2)})
X = FOREACH B GENERATE group, SUM(A.pet_num);
DUMP X;
(Alice,8L)
(Bob,4L)
</Example>
<Types>
<Type expression="int" result="long" />
<Type expression="long" result="long" />
<Type expression="float" result="double" />
<Type expression="double" result="double" />
<Type expression="chararray" result="" desc="error" />
<Type expression="bytearray" result="double" desc="cast as double" />
</Types>
</Function>
<Function name="TOKENIZE">
<Desc>Splits a string and outputs a bag of words.</Desc>
<Syntax>TOKENIZE(expression [, 'field_delimiter'])</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with data type chararray.
</Desc>
</Term>
<Term name="'field_delimiter'">
<Desc>
An optional field delimiter (in single quotes).
If field_delimiter is null or not passed, the following will be used as delimiters: space [ ], double quote [ " ], coma [ , ] parenthesis [ () ], star [ * ].
</Desc>
</Term>
</Terms>
<Usage>
Use the TOKENIZE function to split a string of words (all words in a single tuple) into a bag of words (each word in a single tuple).
</Usage>
<Example>
In this example the strings in each row are split.
A = LOAD 'data' AS (f1:chararray);
DUMP A;
(Here is the first string.)
(Here is the second string.)
(Here is the third string.)
X = FOREACH A GENERATE TOKENIZE(f1);
DUMP X;
({(Here),(is),(the),(first),(string.)})
({(Here),(is),(the),(second),(string.)})
({(Here),(is),(the),(third),(string.)})
In this example a field delimiter is specified.
{code}
A = LOAD 'data' AS (f1:chararray);
B = FOREACH A TOKENIZE (f1,'||');
DUMP B;
{code}
</Example>
</Function>
</Category>
<!-- Pig Eval Functions end -->
<!-- Pig Math Functions start -->
<Category name="Pig Math Functions">
<Function name="ABS">
<Desc>Returns the absolute value of an expression.</Desc>
<Syntax>ABS(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
Any expression whose result is type int, long, float, or double.
</Desc>
</Term>
</Terms>
<Usage>
Use the ABS function to return the absolute value of an expression. If the result is not negative (x ≥ 0), the result is returned. If the result is negative (x less-than 0), the negation of the result is returned.
</Usage>
</Function>
<Function name="ACOS">
<Desc>Returns the arc cosine of an expression.</Desc>
<Syntax>ACOS(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the ACOS function to return the arc cosine of an expression.
</Usage>
</Function>
<Function name="ASIN">
<Desc>Returns the arc sine of an expression.</Desc>
<Syntax>ASIN(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the ASIN function to return the arc sine of an expression.
</Usage>
</Function>
<Function name="ATAN">
<Desc>Returns the arc tangent of an expression.</Desc>
<Syntax>ATAN(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the ATAN function to return the arc tangent of an expression.
</Usage>
</Function>
<Function name="CBRT">
<Desc>Returns the cube root of an expression.</Desc>
<Syntax>CBRT(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the CBRT function to return the cube root of an expression.
</Usage>
</Function>
<Function name="CEIL">
<Desc>Returns the value of an expression rounded up to the nearest integer.</Desc>
<Syntax>CEIL(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the CEIL function to return the value of an expression rounded up to the nearest integer. This function never decreases the result value.
x CEIL(x)
4.6 5
3.5 4
2.4 3
1.0 1
-1.0 -1
-2.4 -2
-3.5 -3
-4.6 -4
</Usage>
</Function>
<Function name="COS">
<Desc>Returns the trigonometric cosine of an expression.</Desc>
<Syntax>COS(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression (angle) whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the COS function to return the trigonometric cosine of an expression.
</Usage>
</Function>
<Function name="COSH">
<Desc>Returns the hyperbolic cosine of an expression.</Desc>
<Syntax>COSH(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the COSH function to return the hyperbolic cosine of an expression.
</Usage>
</Function>
<Function name="EXP">
<Desc>Returns Euler's number e raised to the power of x.</Desc>
<Syntax>EXP(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the EXP function to return the value of Euler's number e raised to the power of x (where x is the result value of the expression).
</Usage>
</Function>
<Function name="FLOOR">
<Desc>Returns the value of an expression rounded down to the nearest integer.</Desc>
<Syntax>FLOOR(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the FLOOR function to return the value of an expression rounded down to the nearest integer. This function never increases the result value.
x CEIL(x)
4.6 4
3.5 3
2.4 2
1.0 1
-1.0 -1
-2.4 -3
-3.5 -4
-4.6 -5
</Usage>
</Function>
<Function name="LOG">
<Desc>Returns the natural logarithm (base e) of an expression.</Desc>
<Syntax>LOG(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the LOG function to return the natural logarithm (base e) of an expression.
</Usage>
</Function>
<Function name="LOG10">
<Desc>Returns the base 10 logarithm of an expression.</Desc>
<Syntax>LOG10(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the LOG10 function to return the base 10 logarithm of an expression.
</Usage>
</Function>
<Function name="RANDOM">
<Desc>Returns a pseudo random number.</Desc>
<Syntax>RANDOM()</Syntax>
<Terms>
<Term name="N/A">
<Desc>
No terms.
</Desc>
</Term>
</Terms>
<Usage>
Use the RANDOM function to return a pseudo random number (type double) greater than or equal to 0.0 and less than 1.0.
</Usage>
</Function>
<Function name="ROUND">
<Desc>Returns the value of an expression rounded to an integer.</Desc>
<Syntax>ROUND(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type float or double.
</Desc>
</Term>
</Terms>
<Usage>
Use the ROUND function to return the value of an expression rounded to an integer (if the result type is float) or rounded to a long (if the result type is double).
x CEIL(x)
4.6 5
3.5 4
2.4 2
1.0 1
-1.0 -1
-2.4 -2
-3.5 -3
-4.6 -5
</Usage>
</Function>
<Function name="SIN">
<Desc>Returns the sine of an expression.</Desc>
<Syntax>SIN(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the SIN function to return the sine of an expession.
</Usage>
</Function>
<Function name="SINH">
<Desc>Returns the hyperbolic sine of an expression.</Desc>
<Syntax>SINH(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the SINH function to return the hyperbolic sine of an expression.
</Usage>
</Function>
<Function name="SQRT">
<Desc>Returns the positive square root of an expression.</Desc>
<Syntax>SQRT(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is type double.
</Desc>
</Term>
</Terms>
<Usage>
Use the SQRT function to return the positive square root of an expression.
</Usage>
</Function>
<Function name="TAN">
<Desc>Returns the trignometric tangent of an angle.</Desc>
<Syntax>TAN(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression (angle) whose result is double.
</Desc>
</Term>
</Terms>
<Usage>
Use the TAN function to return the trignometric tangent of an angle.
</Usage>
</Function>
<Function name="TANH">
<Desc>Returns the hyperbolic tangent of an expression.</Desc>
<Syntax>TANH(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is double.
</Desc>
</Term>
</Terms>
<Usage>
Use the TANH function to return the hyperbolic tangent of an expression.
</Usage>
</Function>
</Category>
<!-- Pig Math Functions end -->
<!-- Pig String Functions start -->
<Category name="Pig String Functions">
<Function name="INDEXOF">
<Desc>Returns the index of the first occurrence of a character in a string, searching forward from a start index.</Desc>
<Syntax>INDEXOF(string, 'character', startIndex)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string to be searched.
</Desc>
</Term>
<Term name="'character'">
<Desc>
The character being searched for, in quotes.
</Desc>
</Term>
<Term name="startIndex">
<Desc>
The index from which to begin the forward search.
The string index begins with zero (0).
</Desc>
</Term>
</Terms>
<Usage>
Use the INDEXOF function to determine the index of the first occurrence of a character in a string. The forward search for the character begins at the designated start index.
</Usage>
</Function>
<Function name="LAST_INDEX_OF">
<Desc>Returns the index of the last occurrence of a character in a string, searching backward from a start index.</Desc>
<Syntax>LAST_INDEX_OF(expression)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string to be searched.
</Desc>
</Term>
<Term name="'character'">
<Desc>
The character being searched for, in quotes.
</Desc>
</Term>
<Term name="startIndex">
<Desc>
The index from which to begin the backward search.
The string index begins with zero (0).
</Desc>
</Term>
</Terms>
<Usage>
Use the LAST_INDEX_OF function to determine the index of the last occurrence of a character in a string. The backward search for the character begins at the designated start index.
</Usage>
</Function>
<Function name="LCFIRST">
<Desc>Converts the first character in a string to lower case.</Desc>
<Syntax>LCFIRST(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result type is chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the LCFIRST function to convert only the first character in a string to lower case.
</Usage>
</Function>
<Function name="LOWER">
<Desc>Converts all characters in a string to lower case.</Desc>
<Syntax>LOWER(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result type is chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the LOWER function to convert all characters in a string to lower case.
</Usage>
</Function>
<Function name="REGEX_EXTRACT">
<Desc>Performs regular expression matching and extracts the matched group defined by an index parameter.</Desc>
<Syntax>REGEX_EXTRACT (string, regex, index)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string in which to perform the match.
</Desc>
</Term>
<Term name="regex">
<Desc>
The regular expression.
</Desc>
</Term>
<Term name="index">
<Desc>
The index of the matched group to return.
</Desc>
</Term>
</Terms>
<Usage>
Use the REGEX_EXTRACT function to perform regular expression matching and to extract the matched group defined by the index parameter (where the index is a 1-based parameter.)
The function uses Java regular expression form.
The function returns a string that corresponds to the matched group in the position specified by the index. If there is no matched expression at that position, NULL is returned.
</Usage>
<Example>
This example will return the string '192.168.1.5'.
REGEX_EXTRACT('192.168.1.5:8020', '(.*)\:(.*)', 1);
</Example>
</Function>
<Function name="REGEX_EXTRACT_ALL">
<Desc>Performs regular expression matching and extracts all matched groups.</Desc>
<Syntax>REGEX_EXTRACT (string, regex)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string in which to perform the match.
</Desc>
</Term>
<Term name="regex">
<Desc>
The regular expression.
</Desc>
</Term>
</Terms>
<Usage>
Use the REGEX_EXTRACT_ALL function to perform regular expression matching and to extract all matched groups. The function uses Java regular expression form.
The function returns a tuple where each field represents a matched expression. If there is no match, an empty tuple is returned.
</Usage>
<Example>
This example will return the tuple (192.168.1.5,8020).
REGEX_EXTRACT_ALL('192.168.1.5:8020', '(.*)\:(.*)');
</Example>
</Function>
<Function name="REPLACE">
<Desc>Replaces existing characters in a string with new characters.</Desc>
<Syntax>REPLACE(string, 'oldChar', 'newChar')</Syntax>
<Terms>
<Term name="string">
<Desc>
The string to be updated.
</Desc>
</Term>
<Term name="'oldChar'">
<Desc>
The existing characters being replaced, in quotes.
</Desc>
</Term>
<Term name="'newChar'">
<Desc>
The new characters replacing the existing characters, in quotes.
</Desc>
</Term>
</Terms>
<Usage>
Use the REPLACE function to replace existing characters in a string with new characters.
For example, to change "open source software" to "open source wiki" use this statement: REPLACE(string,'software','wiki');
</Usage>
</Function>
<Function name="STRSPLIT">
<Desc>Splits a string around matches of a given regular expression.</Desc>
<Syntax>STRSPLIT(string, regex, limit)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string to be split.
</Desc>
</Term>
<Term name="regex">
<Desc>
The regular expression.
</Desc>
</Term>
<Term name="Limit">
<Desc>
The number of times the pattern (the compiled representation of the regular expression) is applied.
</Desc>
</Term>
</Terms>
<Usage>
Use the STRSPLIT function to split a string around matches of a given regular expression.
For example, given the string (open:source:software), STRSPLIT (string, ':',2) will return ((open,source:software)) and STRSPLIT (string, ':',3) will return ((open,source,software)).
</Usage>
</Function>
<Function name="SUBSTRING">
<Desc>Returns a substring from a given string.</Desc>
<Syntax>SUBSTRING(string, startIndex, stopIndex)</Syntax>
<Terms>
<Term name="string">
<Desc>
The string from which a substring will be extracted.
</Desc>
</Term>
<Term name="startIndex">
<Desc>
The index (type integer) of the first character of the substring.
The index of a string begins with zero (0).
</Desc>
</Term>
<Term name="stopIndex">
<Desc>
The index (type integer) of the character following the last character of the substring.
</Desc>
</Term>
</Terms>
<Usage>
Use the SUBSTRING function to return a substring from a given string.
Given a field named alpha whose value is ABCDEF, to return substring BCD use this statement: SUBSTRING(alpha,1,4). Note that 1 is the index of B (the first character of the substring) and 4 is the index of E (the character following the last character of the substring).
</Usage>
</Function>
<Function name="TRIM">
<Desc>Returns a copy of a string with leading and trailing white space removed.</Desc>
<Syntax>TRIM(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the TRIM function to remove leading and trailing white space from a string.
</Usage>
</Function>
<Function name="UCFIRST">
<Desc>Returns a string with the first character converted to upper case.</Desc>
<Syntax>UCFIRST(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the UCFIRST function to convert only the first character in a string to upper case.
</Usage>
</Function>
<Function name="UPPER">
<Desc>Returns a string converted to upper case.</Desc>
<Syntax>UPPER(expression)</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression whose result is chararray.
</Desc>
</Term>
</Terms>
<Usage>
Use the UPPER function to convert all characters in a string to upper case.
</Usage>
</Function>
</Category>
<!-- Pig String Functions end -->
<!-- Pig Tuple Bag Map Functions start
<Category name="Pig Tuple Bag Map Functions">
<Function name="TOTUPLE">
<Desc>Converts one or more expressions to type tuple.</Desc>
<Syntax>TOTUPLE(expression [, expression ...])</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression of any datatype.
</Desc>
</Term>
</Terms>
<Usage>
Use the TOTUPLE function to convert one or more expressions to a tuple.
See also: Tuple data type and Type Construction Operators
</Usage>
<Example>
In this example, fields f1, f2 and f3 are converted to a tuple.
a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
DUMP a;
(John,18,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,18,3.8)
b = FOREACH a GENERATE TOTUPLE(f1,f2,f3);
DUMP b;
((John,18,4.0))
((Mary,19,3.8))
((Bill,20,3.9))
((Joe,18,3.8))
</Example>
</Function>
<Function name="TOBAG">
<Desc>Converts one or more expressions to type bag.</Desc>
<Syntax>TOBAG(expression [, expression ...])</Syntax>
<Terms>
<Term name="expression">
<Desc>
An expression with any data type.
</Desc>
</Term>
</Terms>
<Usage>
Use the TOBAG function to convert one or more expressions to individual tuples which are then placed in a bag.
See also: Bag data type and Type Construction Operators
</Usage>
<Example>
In this example, fields f1 and f3 are converted to tuples that are then placed in a bag.
a = LOAD 'student' AS (f1:chararray, f2:int, f3:float);
DUMP a;
(John,18,4.0)
(Mary,19,3.8)
(Bill,20,3.9)
(Joe,18,3.8)
b = FOREACH a GENERATE TOBAG(f1,f3);
DUMP b;
({(John),(4.0)})
({(Mary),(3.8)})
({(Bill),(3.9)})
({(Joe),(3.8)})
</Example>
</Function>
<Function name="TOMAP">
<Desc>Converts key/value expression pairs into a map.</Desc>
<Syntax>TOMAP(key-expression, value-expression [, key-expression, value-expression ...])</Syntax>
<Terms>
<Term name="key-expression">
<Desc>
An expression of type chararray.
</Desc>
</Term>
<Term name="value-expression">
<Desc>
An expression of any type supported by a map.
</Desc>
</Term>
</Terms>
<Usage>
Use the TOMAP function to convert pairs of expressions into a map. Note the following:
You must supply an even number of expressions as parameters
The elements must comply with map type rules:
Every odd element (key-expression) must be a chararray since only chararrays can be keys into the map
Every even element (value-expression) can be of any type supported by a map.
See also: Map data type and Type Construction Operators
</Usage>
<Example>
In this example, student names (type chararray) and student GPAs (type float) are used to create three maps.
A = load 'students' as (name:chararray, age:int, gpa:float);
B = foreach A generate TOMAP(name, gpa);
store B into results;
Input (students)
joe smith 20 3.5
amy chen 22 3.2
leo allen 18 2.1
Output (results)
[joe smith#3.5]
[amy chen#3.2]
[leo allen#2.1]
</Example>
</Function>
<Function name="TOP">
<Desc>Returns the top-n tuples from a bag of tuples.</Desc>
<Syntax>TOP(topN,column,relation)</Syntax>
<Terms>
<Term name="topN">
<Desc>
The number of top tuples to return (type integer).
</Desc>
</Term>
<Term name="column">
<Desc>
The tuple column whose values are being compared.
</Desc>
</Term>
<Term name="relation">
<Desc>
The relation (bag of tuples) containing the tuple column.
</Desc>
</Term>
</Terms>
<Usage>
TOP function returns a bag containing top N tuples from the input bag where N is controlled by the first parameter to the function. The tuple comparison is performed based on a single column from the tuple. The column position is determined by the second parameter to the function. The function assumes that all tuples in the bag contain an element of the same type in the compared column
</Usage>
<Example>
In this example the top 10 occurrences are returned.
A = LOAD 'data' as (first: chararray, second: chararray);
B = GROUP A BY (first, second);
C = FOREACH B generate FLATTEN(group), COUNT(*) as count;
D = GROUP C BY first; // again group by first
topResults = FOREACH D {
result = TOP(10, 2, C); // and retain top 10 occurrences of 'second' in first
GENERATE FLATTEN(result);
}
</Example>
</Function>
</Category>
Pig Tuple Bag Map Functions end -->
</Functions>