[DO NOT MERGE] MSFT Build Copilot coding agent megabranch (#55536)
Co-authored-by: hubwriter <hubwriter@github.com> Co-authored-by: Felicity Chapman <felicitymay@github.com> Co-authored-by: Joe Clark <31087804+jc-clark@users.noreply.github.com> Co-authored-by: Isaac Brown <101839405+isaacmbrown@users.noreply.github.com> Co-authored-by: Anne-Marie <102995847+am-stead@users.noreply.github.com> Co-authored-by: felix <guntrip@github.com> Co-authored-by: Felix Guntrip <stevecat@github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Laura Coursen <lecoursen@github.com> Co-authored-by: Tim Rogers <timrogers@github.com> Co-authored-by: Sarah Schneider <sarahs@users.noreply.github.com> Co-authored-by: Sophie <29382425+sophietheking@users.noreply.github.com> Co-authored-by: Nhu Do <nhu-do@github.com> Co-authored-by: Ellie Bennett <34038869+elliebennett@users.noreply.github.com> Co-authored-by: Janice <janiceilene@github.com> Co-authored-by: Melanie Yarbrough <11952755+myarb@users.noreply.github.com> Co-authored-by: Sarah Schneider <sarahs@github.com> Co-authored-by: Ben Ahmady <32935794+subatoi@users.noreply.github.com> Co-authored-by: Sunbrye Ly <56200261+sunbrye@users.noreply.github.com> Co-authored-by: Steve Ward <steveward@github.com> Co-authored-by: Vanessa <vgrl@github.com> Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com> Co-authored-by: Kate Catlin <KateCatlin@users.noreply.github.com> Co-authored-by: Claire W <78226508+crwaters16@users.noreply.github.com>
This commit is contained in:
264
content/github-models/use-github-models/evaluating-ai-models.md
Normal file
264
content/github-models/use-github-models/evaluating-ai-models.md
Normal file
@@ -0,0 +1,264 @@
|
||||
---
|
||||
title: Evaluating AI models
|
||||
intro: 'Test and compare AI model outputs using evaluators and scoring metrics in GitHub Models.'
|
||||
versions:
|
||||
feature: github-models
|
||||
shortTitle: Evaluate AI models
|
||||
topics:
|
||||
- GitHub Models
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
{% data variables.product.prodname_github_models %} provides a simple evaluation workflow that helps developers compare large language models (LLMs), refine prompts, and make data-driven decisions within the GitHub platform. You can use {% data variables.product.prodname_github_models %} to experiment with new features or validate model changes by analyzing performance, accuracy, and cost through structured evaluation tools.
|
||||
|
||||
## Use cases for {% data variables.product.prodname_github_models %}
|
||||
|
||||
Model behavior can vary widely based on the prompt, input, or configuration. {% data variables.product.prodname_github_models %} helps you:
|
||||
|
||||
* Test and compare multiple LLMs across realistic use cases.
|
||||
* Optimize prompt phrasing, temperature, and other parameters.
|
||||
* Evaluate model outputs using structured, repeatable metrics.
|
||||
* Make AI development integrated into your development workflow.
|
||||
|
||||
## Example scenario
|
||||
|
||||
Consider a scenario where you're building a feature to summarize customer feedback submitted through support tickets. These summaries will be used to generate internal reports and tickets, so the output needs to be clear, relevant, and concise.
|
||||
|
||||
You want to:
|
||||
|
||||
* Experiment with different models and prompt configurations.
|
||||
* Evaluate the best-performing configuration based on quality, consistency, and efficiency.
|
||||
* Save the configuration to your repository for reuse and collaboration.
|
||||
|
||||
## Prompt testing in the Playground
|
||||
|
||||
To familiarize yourself with how to create and manage prompts in {% data variables.product.prodname_github_models %}, see [Testing a prompt in the Playground](/github-models/use-github-models/optimizing-your-ai-powered-app-with-github-models#testing-a-prompt-in-the-playground).
|
||||
|
||||
The Playground lets you compare models side by side, adjust their parameters, and test prompt variations.
|
||||
|
||||
In this step, you'll configure a model to generate summaries for customer support feedback. You'll define a system prompt, test it with a sample input, and refine it to ensure the output is concise and relevant.
|
||||
|
||||
### Defining the system prompt
|
||||
|
||||
Define the model's behavior for your current goal. In this case, the goal is to summarize customer feedback. Under **Parameters**, enter the following System prompt:
|
||||
|
||||
You are a helpful assistant that summarizes support ticket responses into concise summaries.
|
||||
|
||||
Leave the remaining settings set to their defaults.
|
||||
|
||||

|
||||
|
||||
### Writing a user prompt
|
||||
|
||||
Now that the model is set up, enter the following customer feedback into the **Prompt** dialog:
|
||||
|
||||
The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile.
|
||||
|
||||
The model may generate a response like the following:
|
||||
|
||||
The user experiences consistent app crashes when attempting to upload a PDF from their phone. Uploading PDFs works normally on desktop. They request an investigation into the issue.
|
||||
|
||||
### Defining the input variable
|
||||
|
||||
At this point, the configuration generates a clear and concise summary. At the bottom of the **Parameters** settings, click **Create prompt.yml file** to open the Prompt view. Your system prompt will be prefilled automatically.
|
||||
|
||||
In the **User prompt** field, enter the following prompt:
|
||||
|
||||
```text
|
||||
Summarize the following: {% raw %}{{input}}{% endraw %}
|
||||
```
|
||||
|
||||
The {% raw %}`{{input}}`{% endraw %} variable acts as a placeholder that lets you test different inputs (customer feedback) without modifying the prompt each time. Each test input you add will replace {% raw %}`{{input}}`{% endraw %} when the comparison runs.
|
||||
|
||||
## Adding test inputs
|
||||
|
||||
At the top of the Prompts view, select **Compare** to toggle the Comparisons view. This view allows you to run structured comparisons across multiple prompts or models and apply evaluators to measure performance.
|
||||
|
||||

|
||||
|
||||
In the Comparisons view, each row in the table represents a single test case, with a specific input and an expected output. Each column presents a different prompt configuration to compare how various models or prompt styles perform using evaluators.
|
||||
|
||||
Click **Add rows** to enter your test data. The input simulates a real support message, and the expected output represents the ideal summary the model should return. The table below provides sample test inputs and their corresponding expected outputs for evaluation.
|
||||
|
||||
| Row | **Input** | **Expected output** |
|
||||
|-----|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 1 | The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile. | User reports that the mobile app crashes every time they try to upload a PDF, while the desktop version works without issues. |
|
||||
| 2 | I contacted support two days ago and haven't heard back. I need help recovering my account ASAP. | User is awaiting support response and urgently needs account recovery assistance. |
|
||||
| 3 | Please add dark mode. It's very hard to use at night. My eyes hurt after prolonged use. | User requests dark mode due to eye strain during nighttime use. |
|
||||
|
||||
### Adjusting model parameters
|
||||
|
||||
To the right of the table, click {% octicon "plus" aria-label="Add prompt configuration" %} to add a new prompt configuration.
|
||||
|
||||
Within the new prompt configuration, you can update the model and fine-tune its behavior using the available parameters settings. These settings control how the model generates text, including its length, randomness, and repetition.
|
||||
|
||||
### Configuring the model
|
||||
|
||||
From the Model dropdown, choose PHI-4 to create a distinct configuration for comparison.
|
||||
|
||||
You can adjust the following parameters to influence the model’s output:
|
||||
|
||||
* **Max Tokens**: Sets the maximum number of tokens the model can return. Higher values allow longer outputs.
|
||||
* **Temperature**: Controls randomness in the response. Lower values (0.2–0.4) produce more focused, deterministic outputs. Higher values (0.8–1.0) introduce more variation and creativity.
|
||||
* **Top P**: Controls output diversity by selecting from a pool of the most probable next words. Lower values reduce variability, similar to lowering temperature.
|
||||
* **Presence Penalty**: Discourages the model from introducing new topics. Higher values apply a stronger penalty. A value of 0 is typically suitable for summarization.
|
||||
* **Frequency Penalty**: Reduces the likelihood of repeating words. Higher values apply a stronger penalty. A value between 0 and 0.5 helps keep summaries clear and free of redundancy.
|
||||
* **Stop**: Specifies one or more strings that, when generated, will cut off the model's response. Use this to prevent overly long outputs or enforce formatting rules.
|
||||
|
||||
The table below provides a parameter configuration for generating concise summaries during model comparison.
|
||||
|
||||
| Parameter | Value | Reason |
|
||||
|-------------------|------------|-------------------------------------------------------------|
|
||||
| Max Tokens | 128 | Keeps responses short and on-topic |
|
||||
| Temperature | 0.3 | Ensures deterministic, focused outputs |
|
||||
| Top P | 1.0 | Allows full vocabulary but keeps selection guided |
|
||||
| Presence Penalty | 0 | No penalty—summaries don’t need topic variation |
|
||||
| Frequency Penalty | 0.3 | Reduces repeated phrasing in compact summaries |
|
||||
| Stop | (optional) | Use if you want to end output after a keyword or symbol |
|
||||
|
||||
After applying the parameters, you can add additional columns to compare more models or prompt configurations side by side.
|
||||
|
||||
## Evaluating outputs
|
||||
|
||||
Once the prompt is configured, run a structured evaluation to compare model outputs using real data and repeatable metrics.
|
||||
|
||||
Model evaluation helps you understand how different models and prompt configurations perform across real inputs. In the Prompt view, you can apply evaluators to multiple models side by side and review metrics such as similarity, relevance, and groundedness.
|
||||
|
||||
The following evaluators are available:
|
||||
|
||||
* **Similarity**: Measures how closely a model's output matches an expected or reference answer. This is useful when you want to confirm that the model returns consistent and accurate responses aligned with a known result. The score ranges from 0 to 1, with higher values indicating greater similarity.
|
||||
* **Relevance**: Refers to how effectively a response addresses a question. It assesses the accuracy, completeness, and direct relevance of the response based solely on the given information. The score ranges from 0 to 1, with higher values indicating stronger alignment with the input's intent.
|
||||
* **Groundedness**: Measures how well an answer is anchored in the provided context, evaluating its relevance, accuracy, and completeness based exclusively on that context. It assesses the extent to which the answer fully addresses the question without introducing unrelated or incorrect information. The score ranges from 0 to 1, with higher values indicating higher accuracy.
|
||||
* **Custom prompt**: Lets you define your own evaluation criteria for one LLM to assess the output of another. This allows you to score model outputs based on your own guidelines. You can choose between pass/fail or scored evaluations, making it ideal for scenarios where standard metrics do not capture testing expectations.
|
||||
|
||||
When you’re ready to evaluate, click **Run** to generate and compare the outputs across all prompt configurations. After the run completes, {% data variables.product.prodname_github_models %} displays the output from each prompt configuration, along with evaluator scores.
|
||||
|
||||

|
||||
|
||||
## Test case: PDF upload crash
|
||||
|
||||
**Input**: `The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile.`
|
||||
|
||||
The following tables display each model's output and its evaluator scores:
|
||||
|
||||
| Model | Output |
|
||||
|-------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GPT-4.1 | User reports that the app crashes when uploading a PDF from mobile, though uploading works on desktop. |
|
||||
| DeepSeek-R1 | <think> Okay, the user is having trouble with the app crashing when they try to upload a PDF from their phone. It works on the desktop but not mobile. Let me break this down. First, the main issue is the app crashing specifically during PDF uploads on mobile. The fact that it works on desktop suggests it's a mobile-specific problem. Possible causes could be related to the mobile app version, the phone's OS, file size limits on mobile, or compatibility issues with certain PDF formats on mobile devices. I need to make sure the summary mentions the crash during PDF uploads on mobile, note that desktop works, and |
|
||||
| Phi-4 | The app crashes when attempting to upload a PDF from the mobile device, though it functions correctly on the desktop version. |
|
||||
|
||||
| Model | Similarity | Relevance | Groundedness | Input tokens | Output tokens | Latency |
|
||||
|---------------|------------|-----------|--------------|--------------|---------------|---------|
|
||||
| GPT-4.1 | 100% | 50% | 100% | 61 | 20 | 918ms |
|
||||
| DeepSeek-R1 | 50% | 50% | 75% | 52 | 128 | 2285ms |
|
||||
| Phi-4 | 75% | 100% | 100% | 61 | 66 | 1117ms |
|
||||
|
||||
Use the evaluator scores to assess and compare responses beyond surface-level phrasing.
|
||||
|
||||
### Similarity
|
||||
|
||||
Evaluate how closely each model’s output aligns with the expected summary. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Similarity score |
|
||||
|---------------|------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 75% |
|
||||
|
||||
While all models included the key content from the input, DeepSeek-R1’s similarity score is significantly lower due to its verbose internal commentary, which deviate from the expected concise summary format. In contrast, GPT-4.1’s response matches the phrasing and structure of the reference output.
|
||||
|
||||
### Relevance
|
||||
|
||||
Evaluate how well each model captures the core intent of the input. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Relevance score |
|
||||
|---------------|-----------------|
|
||||
| GPT-4.1 | 50% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
All three models recognized the key issue of the app crashing during PDF uploads on mobile. Phi-4 earned a higher relevance score for reflecting the user's perspective more completely. DeepSeek-R1 lost points for introducing speculative technical causes not mentioned in the original input.
|
||||
|
||||
### Groundedness
|
||||
|
||||
Evaluate whether each model’s output stays true to the input without introducing unsupported information.
|
||||
The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Groundedness score |
|
||||
|---------------|--------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 75% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
Even though DeepSeek-R1 adds internal commentary, it does not introduce hallucinated facts. Its final summary sentence correctly reflects the original input.
|
||||
|
||||
## Test case: Dark mode request
|
||||
|
||||
**Input**: `Please add dark mode. It's very hard to use at night. My eyes hurt after prolonged use.`
|
||||
|
||||
The following tables display each model's output and its evaluator scores:
|
||||
|
||||
| Model | Output |
|
||||
|-------|---------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GPT-4.1 | User requests the addition of a dark mode feature due to discomfort and eye strain when using the app at night. |
|
||||
| DeepSeek-R1 | <think> Okay, the user wants dark mode added because it's hard to use at night and their eyes hurt. I need to make a concise summary. Let me start with the main request: adding dark mode. Then mention the reason: eye strain from prolonged use at night. Keep it to one sentence. Maybe structure it as "Request to implement dark mode to reduce eye strain during nighttime use." That covers the key points without extra details. Check if that's under the character limit. Yep, looks good. </think> Summary: Request to implement dark mode to reduce eye strain caused by prolonged nighttime use. |
|
||||
| Phi-4 | The customer is requesting the addition of a dark mode feature to reduce eye strain when using the product at night. |
|
||||
|
||||
| Model | Similarity | Relevance | Groundedness | Input Tokens | Output Tokens | Latency |
|
||||
|---------------|------------|-----------|--------------|---------------|----------------|---------|
|
||||
| GPT-4.1 | 100% | 75% | 100% | 57 | 18 | 1286ms |
|
||||
| DeepSeek-R1 | 50% | 0% | 25% | 49 | 128 | 1946ms |
|
||||
| Phi-4 | 100% | 75% | 100% | 58 | 20 | 899ms |
|
||||
|
||||
### Similarity
|
||||
|
||||
Evaluate how closely each model’s output aligns with the expected summary. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Similarity score |
|
||||
|---------------|------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
While all models included the key content from the input, DeepSeek-R1’s similarity score is again, significantly lower due to its verbose internal commentary.
|
||||
|
||||
### Relevance
|
||||
|
||||
Evaluate how well each model captures the core intent of the input. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Relevance score |
|
||||
|---------------|-----------------|
|
||||
| GPT-4.1 | 75% |
|
||||
| DeepSeek-R1 | 0% |
|
||||
| Phi-4 | 75% |
|
||||
|
||||
GPT-4.1 and Phi-4 both captured the main intent of the user’s request: the need for dark mode to reduce eye strain and improve usability at night. DeepSeek-R1 scored 0% in relevance due to its verbose internal commentary that distracted from the actual output.
|
||||
|
||||
### Groundedness
|
||||
|
||||
Evaluate whether each model’s output stays true to the input without introducing unsupported information.
|
||||
The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Groundedness score |
|
||||
|---------------|--------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 25% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
DeepSeek-R1 scored lower due to its verbose `<think>` block, which included speculative reasoning not present in the original input.
|
||||
|
||||
## Save the configuration
|
||||
|
||||
After completing your evaluations, the final step is to choose the model that performs best for your specific use case. In the examples above, Phi-4 and GPT-4.1 delivered strong, consistent results across all evaluators. DeepSeek-R1 scored lower due to its verbose reasoning and less focused outputs.
|
||||
|
||||
Once you've selected your preferred model and prompt configuration, add a descriptive name to the prompt file, then click **Commit changes**. This will save the model, prompt, parameter settings, and associated dataset as a reusable configuration file in your repository.
|
||||
|
||||

|
||||
|
||||
Committing your prompt configuration makes it easy to reuse, collaborate, and iterate across model setups. It makes it easier to rerun evaluations and track the performance of the prompt configuration over time.
|
||||
|
||||
## Further reading
|
||||
|
||||
* [AUTOTITLE](/github-models/use-github-models/storing-prompts-in-github-repositories)
|
||||
* [AUTOTITLE](/github-models/use-github-models/integrating-ai-models-into-your-development-workflow)
|
||||
12
content/github-models/use-github-models/index.md
Normal file
12
content/github-models/use-github-models/index.md
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: Use GitHub Models
|
||||
intro: '{% data variables.product.prodname_github_models %} helps you go from prompt to production by testing, comparing, evaluating, and integrating AI directly in your repository.'
|
||||
versions:
|
||||
feature: github-models
|
||||
children:
|
||||
- /prototyping-with-ai-models
|
||||
- /optimizing-your-ai-powered-app-with-github-models
|
||||
- /evaluating-ai-models
|
||||
- /storing-prompts-in-github-repositories
|
||||
- /integrating-ai-models-into-your-development-workflow
|
||||
---
|
||||
@@ -0,0 +1,107 @@
|
||||
---
|
||||
title: Integrating AI models into your development workflow
|
||||
intro: 'Call AI models in the tools you use every day.'
|
||||
versions:
|
||||
feature: github-models
|
||||
shortTitle: Integrate AI models
|
||||
---
|
||||
|
||||
With {% data variables.product.prodname_github_models %} extensions, you can call specific AI models from both {% data variables.product.prodname_copilot_chat_short %} and {% data variables.product.prodname_cli %}. These extensions integrate directly into your development workflow, allowing you to prompt models without context switching.
|
||||
|
||||
## Using AI models in {% data variables.product.prodname_copilot_chat_short %}
|
||||
|
||||
If you have a {% data variables.product.prodname_copilot_short %} subscription, you can work with AI models in {% data variables.product.prodname_copilot_chat_short %} in two different ways:
|
||||
* Using the {% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %}. With this extension, you can ask for model recommendations based on certain criteria and chat with specific models. See [Using the {% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %}](#using-the-github-models-copilot-extension).
|
||||
* Using multiple model support in {% data variables.product.prodname_copilot_chat_short %}. With multi-model {% data variables.product.prodname_copilot_chat_short %}, you can choose a specific model to use for a conversation, then prompt {% data variables.product.prodname_copilot_chat_short %} as usual. See [AUTOTITLE](/copilot/using-github-copilot/ai-models/changing-the-ai-model-for-copilot-chat).
|
||||
|
||||
### Using the {% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %}
|
||||
|
||||
> [!NOTE] The {% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %} is in {% data variables.release-phases.public_preview %} and is subject to change.
|
||||
|
||||
1. Install the [{% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %}](https://github.com/marketplace/models-github).
|
||||
* If you have a {% data variables.product.prodname_copilot_pro_short %} subscription, you can install the extension on your personal account.
|
||||
* If you have access to {% data variables.product.prodname_copilot_short %} through a {% data variables.product.prodname_copilot_business_short %} or {% data variables.product.prodname_copilot_enterprise_short %} subscription:
|
||||
* An organization owner or enterprise owner needs to enable the {% data variables.product.prodname_copilot_extensions_short %} policy for your organization or enterprise.
|
||||
* An organization owner needs to install the extension for your organization.
|
||||
|
||||
1. Open any implementation of {% data variables.product.prodname_copilot_chat_short %} that supports {% data variables.product.prodname_copilot_extensions %}. For a list of supported {% data variables.product.prodname_copilot_chat_short %} implementations, see [AUTOTITLE](/copilot/using-github-copilot/using-extensions-to-integrate-external-tools-with-copilot-chat#supported-clients-and-ides).
|
||||
1. In the chat window, type `@models YOUR-PROMPT`, then send your prompt. There are several use cases for the {% data variables.product.prodname_github_models %} {% data variables.product.prodname_copilot_extension_short %}, including:
|
||||
* Recommending a particular model based on context and criteria you provide. For example, you can ask for a low-cost OpenAI model that supports function calling.
|
||||
* Executing prompts using a particular model. This is especially useful when you want to use a model that is not currently available in multi-model {% data variables.product.prodname_copilot_chat_short %}.
|
||||
* Listing models currently available through {% data variables.product.prodname_github_models %}
|
||||
|
||||
## Using AI models with {% data variables.product.prodname_actions %}
|
||||
|
||||
You can use the {% data variables.product.prodname_actions %} token (`GITHUB_TOKEN`) to call AI models directly inside your workflows.
|
||||
|
||||
### Setting permissions
|
||||
|
||||
To use AI models in your workflows, ensure that the `models` permission is enabled in your workflow configuration. This permission allows workflows to access the {% data variables.product.prodname_github_models %} inference API. You can either set this permission itself or use the general `read-all` or `write-all` permissions. See [AUTOTITLE](/rest/overview/permissions-required-for-github-apps?apiVersion=2022-11-28#repository-permissions-for-actions).
|
||||
|
||||
### Writing your workflow file
|
||||
|
||||
You can call the inference API directly from your workflow. For instance:
|
||||
|
||||
```yaml
|
||||
name: Use GitHub Models
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
models: read
|
||||
|
||||
jobs:
|
||||
call-model:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Call AI model
|
||||
env:
|
||||
GITHUB_TOKEN: {% raw %}${{ secrets.GITHUB_TOKEN }}{% endraw %}
|
||||
run: |
|
||||
curl "https://models.github.ai/inference/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $GITHUB_TOKEN" \
|
||||
-d '{
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Explain the concept of recursion."
|
||||
}
|
||||
],
|
||||
"model": "openai/gpt-4o"
|
||||
}'
|
||||
```
|
||||
|
||||
## Using AI models from the command line
|
||||
|
||||
> [!NOTE] The {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_cli %} is in {% data variables.release-phases.public_preview %} and is subject to change.
|
||||
|
||||
You can use the {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_cli %} to prompt AI models from the command line, and even pipe in the output of a command as context.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
To use the {% data variables.product.prodname_github_models %} CLI extension, you must install {% data variables.product.prodname_cli %}. {% data reusables.cli.cli-installation %}
|
||||
|
||||
### Installing the extension
|
||||
|
||||
1. If you have not already authenticated to the {% data variables.product.prodname_cli %}, run the following command in your terminal.
|
||||
|
||||
```shell copy
|
||||
gh auth login
|
||||
```
|
||||
|
||||
1. To install the {% data variables.product.prodname_github_models %} extension, run the following command.
|
||||
|
||||
```shell copy
|
||||
gh extension install https://github.com/github/gh-models
|
||||
```
|
||||
|
||||
### Using the extension
|
||||
|
||||
To see a list of all available commands, run `gh models`.
|
||||
|
||||
There are a few key ways you can use the extension:
|
||||
* **To ask a model multiple questions using a chat experience**, run `gh models run`. Select your model from the listed models, then send your prompts.
|
||||
* **To ask a model a single question**, run `gh models run MODEL-NAME "QUESTION"` in your terminal. For example, to ask the {% data variables.copilot.copilot_gpt_41 %} model why the sky is blue, you can run `gh models run gpt-4.1 "why is the sky blue?"`.
|
||||
* **To provide the output of a command as context when you call a model**, you can join a separate command and the call to the model with the pipe character (`|`). For example, to summarize the README file in the current directory using the {% data variables.copilot.copilot_gpt_41 %} model, you can run `cat README.md | gh models run gpt-4.1 "summarize this text"`.
|
||||
@@ -0,0 +1,173 @@
|
||||
---
|
||||
title: Optimizing your AI-powered app with Models
|
||||
shortTitle: Optimize your AI-powered app
|
||||
intro: 'Learn how to test models and refine prompts for your AI-powered application.'
|
||||
versions:
|
||||
feature: github-models
|
||||
allowTitleToDifferFromFilename: true
|
||||
---
|
||||
|
||||
With new AI models being released regularly, choosing the right one for your application can be challenging. {% data variables.product.prodname_github_models %} helps you optimize your AI-powered application by letting you **compare different models and prompt variations** against sample inputs, while using **built-in evaluators** to validate model output.
|
||||
|
||||
Through an example scenario, we'll build an AI-powered assistant that helps users learn how to use Git from the command line. We'll walk through comparing different models, and you'll learn how to refine prompt variations to enhance output quality.
|
||||
|
||||
>[!NOTE]
|
||||
>
|
||||
> * {% data variables.product.prodname_github_models %} is in {% data variables.release-phases.public_preview %} and subject to change.
|
||||
> * Usage is rate limited. See [AUTOTITLE](/github-models/use-github-models/prototyping-with-ai-models#rate-limits).
|
||||
|
||||
## Testing a prompt
|
||||
|
||||
The {% data variables.product.prodname_github_models %} **Comparisons** view allows you to adjust model parameters and prompts to test model output.
|
||||
|
||||
### 1. Creating a sample repository
|
||||
|
||||
You can access the **Comparisons** view directly from the **Models** tab in any repository, but in this guide we'll create a new repository to use as a test environment.
|
||||
|
||||
1. Navigate to the [new repository](https://github.com/new) page.
|
||||
1. Under "Owner", make sure your user account is selected.
|
||||
1. In the "Repository name" field, type `models-playground`.
|
||||
1. Beneath the description field, select **Private** to set the repository visibility.
|
||||
1. Click **Create repository**.
|
||||
|
||||
### 2. Creating a new prompt
|
||||
|
||||
1. On the main page of your new repository, click the **Models** tab.
|
||||
1. In the "Prompts" section, click **{% octicon "plus" aria-hidden="true" %} New Prompt**.
|
||||
1. In the upper-left corner, choose a model from the dropdown menu.
|
||||
|
||||
### 3. Writing a system prompt
|
||||
|
||||
A system prompt is a set of instructions that defines the role, behavior, and limitations of an AI model before it interacts with users. In this example, we'll be working on an AI-powered application that explains how to use Git from the command line.
|
||||
|
||||
In the **System prompt** field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
You are an expert at using the Git version control system. I will ask questions looking for guidance on the best way to perform tasks using Git, and you will give clear, step-by-step answers that explain each step you are recommending.
|
||||
```
|
||||
|
||||
> [!NOTE] If the **System** text field is not editable, try choosing a different model in the model dropdown above. Not all models allow the system prompt to be modified.
|
||||
|
||||
### 4. Writing a user prompt
|
||||
|
||||
The user prompt is the direct question or instruction to an AI system during their conversation, which the AI system responds to.
|
||||
|
||||
In the **User prompt** field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line. {{input}}
|
||||
```
|
||||
|
||||
### 5. Entering sample input
|
||||
|
||||
The {% raw %}`{{input}}`{% endraw %} variable in the **User prompt** functions as a placeholder for sample input. To manage this placeholder, click the **Variables** button and enter the following text:
|
||||
|
||||
```text copy
|
||||
When should I use rebase or merge?
|
||||
```
|
||||
|
||||
### 6. Running the sample prompt
|
||||
|
||||
1. In the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Play**.
|
||||
1. Make a change to the model or prompt, then run the prompt again to see what results you get.
|
||||
|
||||
## Testing different models against a prompt
|
||||
|
||||
Now, let's determine which model will work best for our application, using the **Comparisons** view. This view allows you to test different models on the same input, revealing differences in accuracy, creativity, tone, reasoning, and reliability. That will help us choose the model that best fits our needs in terms of quality, speed, cost, and consistency.
|
||||
|
||||
1. In the upper-left corner, click **Comparisons**.
|
||||
1. To compare different models, click **{% octicon "plus" aria-hidden="true" %} Add prompt** and select **{% octicon "repo-forked" aria-hidden="true" %} Copy original prompt** to duplicate your existing system and user prompts. Create at least two copies of the original prompt to evaluate three different models.
|
||||
1. Next to each prompt, click {% octicon "pencil" aria-hidden="Edit prompt" %}. From the **Model** dropdown, choose a different model for each prompt you created.
|
||||
1. Click **{% octicon "plus" aria-hidden="true" %} Add inputs** to create new rows for sample inputs.
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs**. Then, in the "Input" field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
How do I modify the most recent commit message in my current branch?
|
||||
```
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs** again, then paste the following input:
|
||||
|
||||
```text copy
|
||||
How do I move a specific commit from one branch to a different branch?
|
||||
```
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs** one more time, and paste this input:
|
||||
|
||||
```text copy
|
||||
How do I find the author of a specific commit in a repository's history?
|
||||
```
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. Try out different models against your prompt, making note of the **Latency** and the **Input** and **Output** token usage of the different models.
|
||||
|
||||
## Testing prompt variations with a specific model
|
||||
|
||||
If you’re building an application with a specific AI model, you need responses to be predictable and reliable. Testing prompt variations helps you to:
|
||||
|
||||
* **Optimize performance and quality**: Slight changes in phrasing can affect the response quality. By testing variations you can find the wording that gets the best response.
|
||||
* **Clarify instructions**: By varying the phrasing of your prompt, you can identify which version the model understands most clearly.
|
||||
* **Adapt to specific model behavior**: You can tailor your input to how a specific model interprets language.
|
||||
* **Verify the format of the output**: You may want a list, a paragraph, a code block, or a specific tone. Testing prompt variation helps you enforce a specific structure or style.
|
||||
|
||||
Now, let's use {% data variables.product.prodname_github_models %} to test prompt variations against user input for your specific model.
|
||||
|
||||
### 1. Adding prompt variations
|
||||
|
||||
For this example scenario, select the **same model for each column**, but provide a different prompt variation by editing the "User prompt" field of our existing prompts. Next to each prompt, click {% octicon "pencil" aria-hidden="Edit prompt" %}.
|
||||
|
||||
1. Click **{% octicon "pencil" aria-hidden="Edit prompt" %}** in "Prompt 2." Then, in the "User prompt" field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line, but explain it to me like I am five years old. {{input}}
|
||||
```
|
||||
|
||||
1. Click **{% octicon "pencil" aria-hidden="Edit prompt" %}** in "Prompt 3", then paste the following input:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line. Give me instructions in the form of a haiku. {{input}}
|
||||
```
|
||||
|
||||
### 2. Running prompt variations
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. Try out different prompt variations, and compare the types of output the model gives you.
|
||||
|
||||
## Evaluating model output
|
||||
|
||||
We’ve tested different models and prompt variations in {% data variables.product.prodname_github_models %}, and the next step is interpreting and comparing the results to make informed decisions for our AI-powered application.
|
||||
|
||||
As you ran the models in the example scenarios, the **Input** and **Output** token usage and **Latency** were displayed after each run. Token usage matters because it directly affects **cost, performance, and model limitations.**
|
||||
|
||||
* Since most models charge per token both for input and output, using more tokens increases your cost.
|
||||
* Each model also has a maximum token limit (called a context window), and exceeding it can result in errors or truncated responses.
|
||||
* Longer prompts can slow down response time or reduce clarity, while concise prompts often lead to better, more efficient outputs.
|
||||
|
||||
Using {% data variables.product.prodname_github_models %} to test token usage and latency helps you stay within limits, manage costs, and improve overall effectiveness of your AI-powered application.
|
||||
|
||||
### Using evaluators to judge output
|
||||
|
||||
Depending on the number of prompts and models you are evaluating, the model output can be overwhelming to sort through manually. To help assess the quality of each model's output, you can use **Evaluators** to score results across key dimensions like clarity, accuracy, and relevance. You can define your own evaluation criteria, or use built-in evaluators to automatically rate outputs, making it easier to identify the best-performing model and prompt variation.
|
||||
|
||||
For this example scenario, let's use the **String check** evaluator to check for a string in the output.
|
||||
|
||||
1. In the lower-right corner of the **Prompts** field, click **{% octicon "plus" aria-hidden="true" %} Add evaluator** and select **{% octicon "note" aria-hidden="true" %} String check**.
|
||||
1. In the **Name** field, enter "Amend check", then copy and paste the following input for the **Value** field:
|
||||
|
||||
```text copy
|
||||
git commit --amend
|
||||
```
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. The prompt output will now show a **Pass** or **Fail** label, letting you know which model contained the required string.
|
||||
|
||||
To learn more about the pre-built evaluators like similarity, groundedness, and relevance, see [AUTOTITLE](/github-models/use-github-models/evaluating-ai-models#evaluating-outputs).
|
||||
|
||||
## Next steps
|
||||
|
||||
Now that you've explored how to test models, refine prompts, and evaluate outputs using {% data variables.product.prodname_github_models %}, you're ready to start building prompts for your AI-powered application. After creating a prompt, you can store, version, and share it by committing a `.prompt.yml` file to your repository. This keeps your prompts under version control, and enables easy collaboration on prompt and model refinement. For more information, see [AUTOTITLE](/github-models/use-github-models/storing-prompts-in-github-repositories).
|
||||
|
||||
## Join the community
|
||||
|
||||
To ask questions and share feedback, see this [GitHub Models discussion post](https://github.com/orgs/community/discussions/159087).
|
||||
To learn how others are using {% data variables.product.prodname_github_models %}, visit the [GitHub Community discussions for Models](https://github.com/orgs/community/discussions/categories/models).
|
||||
@@ -0,0 +1,361 @@
|
||||
---
|
||||
title: Prototyping with AI models
|
||||
shortTitle: Prototype with AI models
|
||||
intro: 'Find and experiment with AI models for free.'
|
||||
versions:
|
||||
feature: github-models
|
||||
redirect_from:
|
||||
- /github-models/prototyping-with-ai-models
|
||||
---
|
||||
|
||||
If you want to develop a generative AI application, you can use {% data variables.product.prodname_github_models %} to find and experiment with AI models for free. Once you are ready to bring your application to production, you can switch to a token from a paid Azure account. See the [Azure AI](https://aka.ms/azureai/github-models) documentation.
|
||||
|
||||
See also [AUTOTITLE](/github-models/responsible-use-of-github-models).
|
||||
|
||||
## Finding AI models
|
||||
|
||||
To find an AI model:
|
||||
|
||||
{% data reusables.models.steps-to-open-model-playground %}
|
||||
|
||||
The model is opened in the model playground. Details of the model are displayed in the sidebar on the right. If the sidebar is not displayed, expand it by clicking the **{% octicon "sidebar-expand" aria-label="Show parameters setting" %}** icon at the right of the playground.
|
||||
|
||||
> [!NOTE] Access to OpenAI's models is in {% data variables.release-phases.public_preview %} and subject to change.
|
||||
|
||||
## Experimenting with AI models in the playground
|
||||
|
||||
The AI model playground is a free resource that allows you to adjust model parameters and submit prompts to see how a model responds.
|
||||
|
||||
>[!NOTE]
|
||||
>
|
||||
> * The model playground is in {% data variables.release-phases.public_preview %} and subject to change.
|
||||
> * The playground is rate limited. See [Rate limits](#rate-limits) below.
|
||||
|
||||
To adjust parameters for the model, in the playground, select the **Parameters** tab in the sidebar.
|
||||
|
||||
To see code that corresponds to the parameters that you selected, switch from the **Chat** tab to the **Code** tab.
|
||||
|
||||

|
||||
|
||||
### Comparing models
|
||||
|
||||
You can submit a prompt to two models at the same time and compare the responses.
|
||||
|
||||
With one model open in the playground, click **Compare**, then, in the dropdown menu, select a model for comparison. The selected model opens in a second chat window. When you type a prompt in either chat window, the prompt is mirrored to the other window. The prompts are submitted simultaneously so that you can compare the responses from each model.
|
||||
|
||||
Any parameters you set are used for both models.
|
||||
|
||||
## Evaluating AI models
|
||||
|
||||
Once you've started testing prompts in the playground, you can evaluate model performance using structured metrics. Evaluations help you compare multiple prompt configurations across different models and determine which setup performs best.
|
||||
|
||||
In the Comparisons view, you can apply evaluators like similarity, relevance, and groundedness to measure how well each output meets your expectations. You can also define your own evaluation criteria with a custom prompt evaluator.
|
||||
|
||||
For step-by-step instructions, see [Evaluating outputs](/github-models/use-github-models/evaluating-ai-models#evaluating-outputs).
|
||||
|
||||
## Experimenting with AI models using the API
|
||||
|
||||
>[!NOTE]
|
||||
>
|
||||
> The free API usage is in {% data variables.release-phases.public_preview %} and subject to change.
|
||||
|
||||
{% data variables.product.company_short %} provides free API usage so that you can experiment with AI models in your own application.
|
||||
|
||||
The steps to use each model are similar. In general, you will need to:
|
||||
|
||||
{% data reusables.models.steps-to-open-model-playground %}
|
||||
|
||||
The model opens in the model playground.
|
||||
|
||||
1. Click the **Code** tab.
|
||||
1. Optionally, use the language dropdown to select the programming language.
|
||||
1. Optionally, use the SDK dropdown to select which SDK to use.
|
||||
|
||||
All models can be used with the Azure AI Inference SDK, and some models support additional SDKs. If you want to easily switch between models, you should select "Azure AI Inference SDK". If you selected "REST" as the language, you won't use an SDK. Instead, you will use the API endpoint directly. {% ifversion fpt %} See [{% data variables.product.prodname_github_models %} REST API](/rest/models?apiVersion=2022-11-28). {% endif %}
|
||||
1. Either open a codespace, or set up your local environment:
|
||||
* To run in a codespace, click **{% octicon "codespaces" aria-hidden="true" %} Run codespace**, then click **Create new codespace**.
|
||||
* To run locally:
|
||||
* Create a {% data variables.product.company_short %} {% data variables.product.pat_generic %}. The token needs to have `models:read` permissions. See [AUTOTITLE](/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens).
|
||||
* Save your token as an environment variable.
|
||||
* Install the dependencies for the SDK, if required.
|
||||
1. Use the example code to make a request to the model.
|
||||
|
||||
The free API usage is rate limited. See [Rate limits](#rate-limits) below.
|
||||
|
||||
## Saving and sharing your playground experiments
|
||||
|
||||
You can save and share your progress in the playground with presets. Presets save:
|
||||
* Your current state
|
||||
* Your parameters
|
||||
* Your chat history (optional)
|
||||
|
||||
To create a preset for your current context, select **Preset: PRESET-NAME** {% octicon "triangle-down" aria-hidden="true" %} at the top right of the playground, then click **{% octicon "plus" aria-hidden="true" %} Create new preset**. You need to name your preset, and you can also choose to provide a preset description, include your chat history, and allow your preset to be shared.
|
||||
|
||||
There are two ways to load a preset:
|
||||
* Select the **Preset: PRESET-NAME** {% octicon "triangle-down" aria-hidden="true" %} dropdown menu, then click the preset you want to load.
|
||||
* Open a shared preset URL
|
||||
|
||||
After you load a preset, you can edit, share, or delete the preset:
|
||||
* To edit the preset, change the parameters and prompt the model. Once you are satisfied with your changes, select the **Preset: PRESET-NAME** {% octicon "triangle-down" aria-hidden="true" %} dropdown menu, then click **{% octicon "pencil" aria-hidden="true" %} Edit preset** and save your updates.
|
||||
* To share the preset, select the **Preset: PRESET-NAME** {% octicon "triangle-down" aria-hidden="true" %} dropdown menu, then click **{% octicon "share" aria-hidden="true" %} Share preset** to get a shareable URL.
|
||||
* To delete the preset, select the **Preset: PRESET-NAME** {% octicon "triangle-down" aria-hidden="true" %} dropdown menu, then click **{% octicon "trash" aria-hidden="true" %} Delete preset** and confirm the deletion.
|
||||
|
||||
## Using the prompt editor
|
||||
|
||||
The prompt editor in {% data variables.product.prodname_github_models %} is designed to help you iterate, refine, and perfect your prompts. This dedicated view provides a focused and intuitive experience for crafting and testing inputs, enabling you to:
|
||||
|
||||
* Quickly test and refine prompts without the complexity of multi-turn interactions.
|
||||
* Fine-tune prompts for precision and relevance in your projects.
|
||||
* Use a specialized space for single-turn scenarios to ensure consistent and optimized results.
|
||||
|
||||
To access the prompt editor, click **{% octicon "stack" aria-hidden="true" %} Prompt editor** at the top right of the playground.
|
||||
|
||||

|
||||
|
||||
## Experimenting with AI models in {% data variables.product.prodname_vscode %}
|
||||
|
||||
> [!NOTE] The AI Toolkit extension for {% data variables.product.prodname_vscode %} is in {% data variables.release-phases.public_preview %} and is subject to change.
|
||||
|
||||
If you prefer to experiment with AI models in your IDE, you can install the AI Toolkit extension for {% data variables.product.prodname_vscode %}, then test models with adjustable parameters and context.
|
||||
|
||||
1. In {% data variables.product.prodname_vscode %}, install the pre-release version of the [AI Toolkit for {% data variables.product.prodname_vscode %}](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio).
|
||||
1. To open the extension, click the AI Toolkit icon in the activity bar.
|
||||
1. Authorize the AI Toolkit to connect to your {% data variables.product.prodname_dotcom %} account.
|
||||
1. In the "My models" section of the AI Toolkit panel, click **Open Model Catalog**, then find a model to experiment with.
|
||||
* To use a model hosted remotely through {% data variables.product.prodname_github_models %}, on the model card, click **Try in playground**.
|
||||
* To download and use a model locally, on the model card, click **Download**. Once the download is complete, on the same model card, click **Load in playground**.
|
||||
|
||||
1. In the sidebar, provide any context instructions and inference parameters for the model, then send a prompt.
|
||||
|
||||
## Going to production
|
||||
|
||||
The rate limits for the playground and free API usage are intended to help you experiment with models and develop your AI application. Once you are ready to bring your application to production, you can use a token from a paid Azure account instead of your {% data variables.product.company_short %} {% data variables.product.pat_generic %}. You don't need to change anything else in your code.
|
||||
|
||||
For more information, see the [Azure AI](https://aka.ms/azureai/github-models) documentation.
|
||||
|
||||
## Rate limits
|
||||
|
||||
The playground and free API usage are rate limited by requests per minute, requests per day, tokens per request, and concurrent requests. If you get rate limited, you will need to wait for the rate limit that you hit to reset before you can make more requests.
|
||||
|
||||
Low, high, and embedding models have different rate limits. To see which type of model you are using, refer to the model's information in {% data variables.product.prodname_marketplace %}.
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th scope="col" style="width:15%"><b>Rate limit tier</b></th>
|
||||
<th scope="col" style="width:25%"><b>Rate limits</b></th>
|
||||
<th scope="col" style="width:15%"><b>Copilot Free</b></th>
|
||||
<th scope="col" style="width:15%"><b>Copilot Pro</b></th>
|
||||
<th scope="col" style="width:15%"><b>Copilot Business</b></th>
|
||||
<th scope="col" style="width:15%"><b>Copilot Enterprise</b></th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup"><b>Low</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>15</td>
|
||||
<td>15</td>
|
||||
<td>15</td>
|
||||
<td>20</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>150</td>
|
||||
<td>150</td>
|
||||
<td>300</td>
|
||||
<td>450</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>8000 in, 8000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>5</td>
|
||||
<td>5</td>
|
||||
<td>5</td>
|
||||
<td>8</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup"><b>High</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>10</td>
|
||||
<td>10</td>
|
||||
<td>10</td>
|
||||
<td>15</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>50</td>
|
||||
<td>50</td>
|
||||
<td>100</td>
|
||||
<td>150</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>8000 in, 4000 out</td>
|
||||
<td>16000 in, 8000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>2</td>
|
||||
<td>2</td>
|
||||
<td>2</td>
|
||||
<td>4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup"><b>Embedding</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>15</td>
|
||||
<td>15</td>
|
||||
<td>15</td>
|
||||
<td>20</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>150</td>
|
||||
<td>150</td>
|
||||
<td>300</td>
|
||||
<td>450</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>64000</td>
|
||||
<td>64000</td>
|
||||
<td>64000</td>
|
||||
<td>64000</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>5</td>
|
||||
<td>5</td>
|
||||
<td>5</td>
|
||||
<td>8</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup"><b>Azure OpenAI o1-preview</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>1</td>
|
||||
<td>2</td>
|
||||
<td>2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>8</td>
|
||||
<td>10</td>
|
||||
<td>12</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 8000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup"><b>Azure OpenAI o1 and o3</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>1</td>
|
||||
<td>2</td>
|
||||
<td>2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>8</td>
|
||||
<td>10</td>
|
||||
<td>12</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 8000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup" style="box-shadow: none"><b>Azure OpenAI o1-mini, o3-mini, and o4-mini</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>2</td>
|
||||
<td>3</td>
|
||||
<td>3</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>12</td>
|
||||
<td>15</td>
|
||||
<td>20</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>Not applicable</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th rowspan="4" scope="rowgroup" style="box-shadow: none"><b>DeepSeek-R1 and MAI-DS-R1</b></th>
|
||||
<th style="padding-left: 0"><b>Requests per minute</b></th>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>2</td>
|
||||
<td>2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Requests per day</b></th>
|
||||
<td>8</td>
|
||||
<td>8</td>
|
||||
<td>10</td>
|
||||
<td>12</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Tokens per request</b></th>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
<td>4000 in, 4000 out</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th><b>Concurrent requests</b></th>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
These limits are subject to change without notice.
|
||||
|
||||
## Leaving feedback
|
||||
|
||||
To ask questions and share feedback, see this [GitHub Models discussion post](https://github.com/orgs/community/discussions/159087).
|
||||
To learn how others are using {% data variables.product.prodname_github_models %}, visit the [GitHub Community discussions for Models](https://github.com/orgs/community/discussions/categories/models).
|
||||
@@ -0,0 +1,60 @@
|
||||
---
|
||||
title: Storing prompts in GitHub repositories
|
||||
shortTitle: Store prompts
|
||||
intro: 'Store prompts directly in your GitHub repositories to leverage automated text summarization and other AI-driven functionalities.'
|
||||
allowTitleToDifferFromFilename: true
|
||||
versions:
|
||||
fpt: '*'
|
||||
ghec: '*'
|
||||
---
|
||||
|
||||
Prompts can be stored as files directly within GitHub repositories. This unlocks the ability to view your prompts in an organized UI, share them with non-technical stakeholders, and run seamless iterations and comparisons on adjustments to models and prompts.
|
||||
|
||||
## Benefits
|
||||
|
||||
* Easy integration with the new suite of AI development tools directly on GitHub.
|
||||
* Simple and scalable from simple to complex use cases.
|
||||
* Uses a widely supported format, compatible with existing tools.
|
||||
|
||||
## Supported file format
|
||||
|
||||
Store prompts in markdown files with optional YAML front matter.
|
||||
|
||||
The file can be located anywhere in your repository, but it **must have the extension `.prompt.md`**.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: Summarizer
|
||||
description: Summarizes a given text
|
||||
model: openai/gpt-4o
|
||||
model_parameters:
|
||||
temperature: 0.5
|
||||
---
|
||||
system:
|
||||
You are a text summarizer. Your only job is to summarize a given text to you.
|
||||
user:
|
||||
Summarize the given text:
|
||||
<text>
|
||||
{% raw %}{{text}}{% endraw %}
|
||||
</text>
|
||||
```
|
||||
|
||||
## Prompt structure
|
||||
|
||||
Prompts have two key parts:
|
||||
|
||||
* **Runtime information** (required)
|
||||
* Prompt templates (system, user, etc.) using simple {{variable}} placeholders
|
||||
* **Development information** (optional)
|
||||
* Human-readable name and description
|
||||
* Model identifier and parameters
|
||||
* Sample data for testing and evaluations
|
||||
|
||||
## Limitations
|
||||
|
||||
You cannot store prompts for:
|
||||
|
||||
* Complex templating languages
|
||||
* Proprietary or complex file formats (such as `.ghprompt`, or `.prompty`)
|
||||
Reference in New Issue
Block a user