mirror of
https://github.com/apache/impala.git
synced 2025-12-19 09:58:28 -05:00
Change-Id: Iae921f6554c7010f9568ee4a42b4abcb3534d4a6 Reviewed-on: http://gerrit.cloudera.org:8080/21629 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Yida Wu <wydbaggio000@gmail.com>
229 lines
14 KiB
XML
229 lines
14 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!--
|
||
Licensed to the Apache Software Foundation (ASF) under one
|
||
or more contributor license agreements. See the NOTICE file
|
||
distributed with this work for additional information
|
||
regarding copyright ownership. The ASF licenses this file
|
||
to you under the Apache License, Version 2.0 (the
|
||
"License"); you may not use this file except in compliance
|
||
with the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing,
|
||
software distributed under the License is distributed on an
|
||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations
|
||
under the License.
|
||
-->
|
||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
|
||
<concept id="impala_ai_functions">
|
||
<title>Advantages and use cases of Impala AI functions</title>
|
||
<titlealts audience="PDF">
|
||
<navtitle>AI Functions</navtitle>
|
||
</titlealts>
|
||
<prolog>
|
||
<metadata>
|
||
<data name="Category" value="Impala"/>
|
||
<data name="Category" value="Impala Functions"/>
|
||
<data name="Category" value="UDF"/>
|
||
<data name="Category" value="Data Analysts"/>
|
||
<data name="Category" value="Developers"/>
|
||
<data name="Category" value="Querying"/>
|
||
</metadata>
|
||
</prolog>
|
||
<conbody>
|
||
<p> You can use Impala's ai_generate_text function to access Large Language Models
|
||
(LLMs) in SQL queries. This function enables you to input a prompt, retrieve the LLM response,
|
||
and include it in results. You can create custom UDFs for complex tasks like sentiment
|
||
analysis and translation.
|
||
</p>
|
||
<section id="section_ufg_4tv_fcc">
|
||
<title>Use LLMs directly in SQL with Impala's <codeph>ai_generate_text</codeph>
|
||
function></title>
|
||
<p>Impala introduces a built-in AI function called <codeph>ai_generate_text</codeph> that
|
||
enables direct access to and utilization of Large Language Models (LLMs) in SQL queries.
|
||
With this function, you can input a prompt, which may include data. The function
|
||
communicates with a supported LLM endpoint, sends the prompt, retrieves the response, and
|
||
includes it in the query result.</p>
|
||
<p>Alternatively, seamlessly integrate LLM intelligence into your Impala workflow by creating
|
||
custom User Defined Functions (UDFs) on top of <codeph>ai_generate_text</codeph>. This
|
||
allows you to use concise SQL statements for sending prompts to an LLM and receiving
|
||
responses. You can define UDFs for complex tasks like sentiment analysis, language
|
||
translation, and generative contextual analysis.</p>
|
||
</section>
|
||
<section id="ai_advantages">
|
||
<title>Advantages of using AI functions</title>
|
||
<p>
|
||
<ul id="ul_cdc_fm4_tbc">
|
||
<li><b>Simplified Workflow</b>: Eliminates the necessity for setting up intricate data
|
||
pipelines.</li>
|
||
<li><b>No ML Expertise Required</b>: No specialized machine learning skills are
|
||
needed.</li>
|
||
<li><b>Swift Decision-Making</b>: Enables faster insights on the data, facilitating
|
||
critical business decisions by using in-database function calls.</li>
|
||
<li><b>Integrated Functionality</b>: Requires no external applications, as it is a
|
||
built-in feature in Data Warehouse.</li>
|
||
</ul>
|
||
</p>
|
||
</section>
|
||
<section id="ai_usecases">
|
||
<title>List of possible use cases</title>
|
||
<p>Here are some practical applications of using AI models with the function:<ul
|
||
id="ul_ond_pm4_tbc">
|
||
<li><b>Sentiment Analysis</b>: Use the AI model to examine customer reviews for a product
|
||
and identify their sentiment as positive, negative, or neutral.</li>
|
||
<li><b>Language Translation</b>: Translate product reviews written in different languages
|
||
to understand customer feedback from various regions.</li>
|
||
<li><b>Generative Contextual Analysis</b>: Generate detailed reports and insights on
|
||
various topics based on provided data.</li>
|
||
</ul></p>
|
||
</section>
|
||
<section id="syntax_ai_function">
|
||
<title>Syntax for AI built-in function arguments</title>
|
||
<p>The following example of a built-in AI function demonstrates the use of the OpenAI API as a
|
||
large language model. Currently, OpenAI's public endpoint and Azure OpenAI endpoints are
|
||
supported.</p>
|
||
<dl id="dl_w5l_nn4_tbc">
|
||
<dlentry>
|
||
<dt>AI_GENERATE_TEXT_DEFAULT</dt>
|
||
<dd>Syntax:<codeblock id="codeblock_ibj_qn4_tbc">ai_generate_text_default(prompt)</codeblock></dd>
|
||
</dlentry>
|
||
<dlentry>
|
||
<dt>AI_GENERATE_TEXT</dt>
|
||
<dd>Syntax:<codeblock id="codeblock_qdg_zn4_tbc">ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params)
|
||
</codeblock></dd>
|
||
</dlentry>
|
||
</dl>
|
||
<p>The <codeph>ai_generate_text</codeph> function uses the values you provide as an argument
|
||
in the function for <codeph>ai_endpoint</codeph>, <codeph>ai_model</codeph>, and
|
||
<codeph>ai_api_key_jceks_secret</codeph>. If any of the arguments are left empty or set to
|
||
NULL, the function uses the default values defined at the instance level. These default
|
||
values correspond to the flag settings configured in the Impala instance. For example, if
|
||
the <codeph>ai_endpoint</codeph> argument is NULL or empty, the function will use the value
|
||
specified by the <codeph>ai_endpoint</codeph> flag as the default.</p>
|
||
<p dir="ltr">When using the <codeph>ai_generate_text_default</codeph> function, make sure to
|
||
set all parameters (<codeph>ai_endpoint</codeph>, <codeph>ai_model</codeph>, and
|
||
<codeph>ai_api_key_jceks_secret</codeph>) in the coordinator/executor flagfiles with
|
||
appropriate values.</p>
|
||
</section>
|
||
<section id="key_parameters_ai_function">
|
||
<title>Key parameters for using the AI model</title>
|
||
<ul id="ul_pfm_2p4_tbc">
|
||
<li><b>ai_endpoint</b>: The endpoint for the model API that is being interfaced with,
|
||
supports services like OpenAI and Azure OpenAI Service, for example,
|
||
https://api.openai.com/v1/chat/completions.</li>
|
||
<li><b>prompt</b>: The text you submit to the AI model to generate a response.</li>
|
||
<li><b>ai_model</b>: The specific model name you want to use within the desired API, for
|
||
example, gpt-3.5-turbo.</li>
|
||
<li><b>ai_api_key_jceks_secret</b>: The key name for the JCEKS secret that contains your API
|
||
key for the AI API you are using. You need a JCEKS keystore containing the specified JCEKS
|
||
secret referenced in <codeph>ai_api_key_jceks_secret</codeph>. To do this, set the
|
||
<codeph>hadoop.security.credential.provider.path</codeph> property in the
|
||
<codeph>core-site</codeph> configuration for both the executor and coordinator.</li>
|
||
<li><b>additional_params</b>: Additional parameters that the AI API offers that is provided
|
||
to the built-in function as a JSON object.</li>
|
||
</ul>
|
||
</section>
|
||
<section id="impala-built-in-ai-function">
|
||
<title>Examples of using the built-in AI function</title>
|
||
<p>The following example lists the steps needed to turn a prompt into a custom SQL function
|
||
using just the built-in function
|
||
<codeph>ai_generate_text_default</codeph>.<codeblock id="codeblock_ppy_dr4_tbc">> select ai_generate_text_default('hello');
|
||
Response:
|
||
Hello! How can I assist you today?
|
||
</codeblock></p>
|
||
<p>In the below example, a query is sent to the Amazon book reviews database for the book
|
||
titled Artificial Superintelligence. The large language model (LLM) is prompted to classify
|
||
the sentiment as positive, neutral, or
|
||
negative.<codeblock id="codeblock_hyp_4np_tbc">> select customer_id, star_rating, ai_generate_text_default(CONCAT('Classify the following review as positive, neutral, or negative', and only include the uncapitalized category in the response: ', review_body)) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id LIMIT 1;
|
||
Response:
|
||
+--+------------+------------+----------------+------------------+
|
||
| |customer_id |star_rating |review_analysis |review_body |
|
||
+--+------------+------------+----------------+------------------+
|
||
|1 |4343565 | 5 |positive |What is this book |
|
||
| | | | |all about ………… |
|
||
+--+------------+------------+----------------+------------------+
|
||
</codeblock></p>
|
||
</section>
|
||
<section id="impala-custom-udf">
|
||
<title>Examples of creating and using custom UDFs along with the built-in AI function</title>
|
||
<p>Instead of writing the prompts in a SQL query, you can build a UDF with your intended
|
||
prompt. Once you build your custom UDF, pass your desired prompt within your custom UDF into
|
||
the <codeph>ai_generate_text_default</codeph> built-in Impala function.</p>
|
||
<p>Example: Classify input customer reviews</p>
|
||
<p>The following UDF uses the Amazon book reviews database as the input and requests the LLM
|
||
to classify the sentiment.</p>
|
||
<p>Classify input customer reviews:</p>
|
||
<codeblock id="codeblock_h1z_yrp_tbc">
|
||
IMPALA_UDF_EXPORT
|
||
StringVal ClassifyReviews(FunctionContext* context, const StringVal& input) {
|
||
std::string request =
|
||
std::string("Classify the following review as positive, neutral, or negative")
|
||
+ std::string(" and only include the uncapitalized category in the response: ")
|
||
+ std::string(reinterpret_cast<const char*>(input.ptr), input.len);
|
||
StringVal prompt(request.c_str());
|
||
const StringVal endpoint("https://api.openai.com/v1/chat/completions");
|
||
const StringVal model("gpt-3.5-turbo");
|
||
const StringVal api_key_jceks_secret("open-ai-key");
|
||
const StringVal params("{\"temperature\": 0.9, \"model\": \"gpt-4\"}");
|
||
return context->Functions()->ai_generate_text(
|
||
context, endpoint, prompt, model, api_key_jceks_secret, params);
|
||
}
|
||
</codeblock>
|
||
<p>Now you can define these prompt building UDFs and build them in Impala. Once you have them
|
||
running, you can query your datasets using them.</p>
|
||
<p>Creating <codeph>analyze_reviews</codeph> function:</p>
|
||
<codeblock id="codeblock_b4l_ctp_tbc">> CREATE FUNCTION analyze_reviews(STRING)
|
||
RETURNS STRING
|
||
LOCATION ‘s3a://dw-...............’
|
||
SYMBOL=’ClassifyReviews’
|
||
</codeblock>
|
||
<p>Using SELECT query for Sentiment analysis to classify Amazon book reviews</p>
|
||
<codeblock id="codeblock_xdr_jtp_tbc">> SELECT customer_id, star_rating, analyze_reviews(review_body) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id;
|
||
Response:
|
||
+--+------------+------------+----------------+----------------------+
|
||
| |customer_id |star_rating |review_analysis |review_body |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|1 |44254093 | 5 |positive |What is this book all |
|
||
| | | | |about? It is all about|
|
||
| | | | |a mind-blowing |
|
||
| | | | |universal law of |
|
||
| | | | |nature. Mind-blow… |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|2 |50050072 | 5 |positive |The two tightly- |
|
||
| | | | |connected ideas strike|
|
||
| | | | |you as amazed. In the |
|
||
| | | | |first place, what has |
|
||
| | | | |never bef… |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|3 |50050072 | 5 |positive |The two tightly- |
|
||
| | | | |connected ideas strike|
|
||
| | | | |you as amazed. In the |
|
||
| | | | |first place, what has |
|
||
| | | | |never bef… |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|4 |52932308 | 1 |negative |This book is seriously|
|
||
| | | | |flawed. I could not |
|
||
| | | | |work out if the author|
|
||
| | | | |was a mathemetician |
|
||
| | | | |dabbi… |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|5 |52971961 | 1 |negative |Abdoullaev's |
|
||
| | | | |exploration of |
|
||
| | | | |Al issues appears to |
|
||
| | | | |be very technological |
|
||
| | | | |and straightforward… |
|
||
+--+------------+------------+----------------+----------------------+
|
||
|6 |53008416 | 4 |positive |As Co Founder of |
|
||
| | | | |ArtilectWorld:ultra |
|
||
| | | | |intelligent machine, |
|
||
| | | | |I recommend reading |
|
||
| | | | |this book! |
|
||
+--+------------+------------+----------------+----------------------+
|
||
</codeblock>
|
||
</section>
|
||
</conbody>
|
||
</concept>
|