[DO NOT MERGE] MSFT Build Copilot coding agent megabranch (#55536)
Co-authored-by: hubwriter <hubwriter@github.com> Co-authored-by: Felicity Chapman <felicitymay@github.com> Co-authored-by: Joe Clark <31087804+jc-clark@users.noreply.github.com> Co-authored-by: Isaac Brown <101839405+isaacmbrown@users.noreply.github.com> Co-authored-by: Anne-Marie <102995847+am-stead@users.noreply.github.com> Co-authored-by: felix <guntrip@github.com> Co-authored-by: Felix Guntrip <stevecat@github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Laura Coursen <lecoursen@github.com> Co-authored-by: Tim Rogers <timrogers@github.com> Co-authored-by: Sarah Schneider <sarahs@users.noreply.github.com> Co-authored-by: Sophie <29382425+sophietheking@users.noreply.github.com> Co-authored-by: Nhu Do <nhu-do@github.com> Co-authored-by: Ellie Bennett <34038869+elliebennett@users.noreply.github.com> Co-authored-by: Janice <janiceilene@github.com> Co-authored-by: Melanie Yarbrough <11952755+myarb@users.noreply.github.com> Co-authored-by: Sarah Schneider <sarahs@github.com> Co-authored-by: Ben Ahmady <32935794+subatoi@users.noreply.github.com> Co-authored-by: Sunbrye Ly <56200261+sunbrye@users.noreply.github.com> Co-authored-by: Steve Ward <steveward@github.com> Co-authored-by: Vanessa <vgrl@github.com> Co-authored-by: mc <42146119+mchammer01@users.noreply.github.com> Co-authored-by: Kate Catlin <KateCatlin@users.noreply.github.com> Co-authored-by: Claire W <78226508+crwaters16@users.noreply.github.com>
This commit is contained in:
64
content/github-models/about-github-models.md
Normal file
64
content/github-models/about-github-models.md
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: About GitHub Models
|
||||
intro: '{% data variables.product.prodname_github_models %} is a suite of developer tools that take you from AI idea to ship, including a model catalog, prompt management, and quantitative evaluations.'
|
||||
versions:
|
||||
feature: github-models
|
||||
shortTitle: About GitHub Models
|
||||
topics:
|
||||
- GitHub Models
|
||||
---
|
||||
|
||||
{% data reusables.models.models-preview-note %}
|
||||
|
||||
## Overview
|
||||
|
||||
{% data variables.product.prodname_github_models %} is a workspace lowering the barrier to enterprise-grade AI adoption. It helps you move beyond isolated experimentation by embedding AI development directly into familiar GitHub workflows. {% data variables.product.prodname_github_models %} provides tools to test large language models (LLMs), refine prompts, evaluate outputs, and make informed decisions based on structured metrics. To get started, see [AUTOTITLE](/github-models/use-github-models/optimizing-your-ai-powered-app-with-github-models).
|
||||
|
||||
## Capabilities
|
||||
|
||||
GitHub Models offers a set of features to support prompt iteration, evaluation, and integration for AI development.
|
||||
|
||||
* **Prompt development**: Start AI development directly in a structured editor that supports system instructions, test inputs, and variable configuration.
|
||||
* **Model comparison**: Test multiple models side by side with identical prompts and inputs to experiment with different outputs.
|
||||
* **Evaluators**: Use scoring metrics such as similarity, relevance, and groundedness to analyze outputs and track performance.
|
||||
* **Prompt configurations**: Save prompt, model, and parameter settings as `.prompt.yml` files in your repository. This enables review, collaboration, and reproducibility.
|
||||
* **Production integration**: Use your saved configuration to build AI features or connect through SDKs and{% ifversion fpt %} the [{% data variables.product.prodname_github_models %} REST API](/rest/models?apiVersion=2022-11-28). {% else %} APIs. {% endif %}
|
||||
|
||||
## Enabling GitHub Models
|
||||
|
||||
There are a few ways you can start using {% data variables.product.prodname_github_models %}, depending on your role and needs.
|
||||
|
||||
To use the {% data variables.product.prodname_github_models %} API, see [Experimenting with AI models using the API](/github-models/use-github-models/prototyping-with-ai-models#experimenting-with-ai-models-using-the-api).
|
||||
|
||||
### For individuals
|
||||
|
||||
To use {% data variables.product.prodname_github_models %}, create a new GitHub repository or open an existing one. In the repository settings, click **Models** in the sidebar and enable the feature.
|
||||
|
||||
### For organizations and enterprises
|
||||
|
||||
To use {% data variables.product.prodname_github_models %} in your organization, an enterprise owner must first enable the feature. Organization owners can then configure which models are allowed.
|
||||
|
||||
See [AUTOTITLE](/github-models/github-models-at-scale/manage-models-at-scale).
|
||||
|
||||
## Prompts
|
||||
|
||||
Manage your prompt configurations stored in the repository. Each prompt is saved as a `.prompt.yml` file, which defines the model, parameters, and test inputs. From here, you can create, edit, and organize prompts to support experimentation or production use.
|
||||
|
||||
## Comparisons
|
||||
|
||||
Use the Comparisons view to evaluate the outputs of multiple prompt configurations in a consistent, test-driven workflow. Run tests across rows of input data and view evaluator scores for each configuration, such as similarity, relevance, and groundedness. This view is ideal for refining prompts, validating changes, and avoiding regressions.
|
||||
|
||||
## Playground
|
||||
|
||||
Use the Playground to quickly explore models and test prompt ideas in real time. The Playground is ideal for early experimentation, helping you understand a model’s behavior, capabilities, and response style. You can interactively select models, adjust parameters, and compare responses side by side.
|
||||
|
||||
## Join the community
|
||||
|
||||
To ask questions and share feedback, see this [GitHub Models discussion post](https://github.com/orgs/community/discussions/159087).
|
||||
To learn how others are using {% data variables.product.prodname_github_models %}, visit the [GitHub Community discussions for Models](https://github.com/orgs/community/discussions/categories/models).
|
||||
|
||||
## Further reading
|
||||
|
||||
* [AUTOTITLE](/github-models/use-github-models/prototyping-with-ai-models)
|
||||
* [AUTOTITLE](/github-models/use-github-models/optimizing-your-ai-powered-app-with-github-models)
|
||||
* [AUTOTITLE](/github-models/use-github-models/evaluating-ai-models)
|
||||
9
content/github-models/github-models-at-scale/index.md
Normal file
9
content/github-models/github-models-at-scale/index.md
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
title: GitHub Models at scale
|
||||
intro: 'Manage {% data variables.product.prodname_github_models %} in your enterprise and organizations.'
|
||||
versions:
|
||||
feature: github-models
|
||||
children:
|
||||
- /use-models-at-scale
|
||||
- /manage-models-at-scale
|
||||
---
|
||||
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: Managing your team's model usage
|
||||
shortTitle: Manage Models at scale
|
||||
intro: Control and secure AI models in your organization with {% data variables.product.prodname_github_models %}.
|
||||
versions:
|
||||
feature: github-models
|
||||
permissions: Organization owners can manage the models used in an organization
|
||||
topics:
|
||||
- Enterprise
|
||||
allowTitleToDifferFromFilename: true
|
||||
---
|
||||
|
||||
{% data reusables.models.models-preview-note %}
|
||||
|
||||
## Why restrict model usage in your organization?
|
||||
|
||||
Limiting the models available to your developers can help **control spend on models and meet your governance, data security, and compliance requirements**.
|
||||
|
||||
If you don't manage access, your teams may inadvertently use models that do not meet your organization’s standards, leading to potential risks such as:
|
||||
|
||||
* Unexpected costs from high-priced models
|
||||
* Security or compliance issues caused by unauthorized AI services
|
||||
* Time wasted integrating unapproved or suboptimal models
|
||||
|
||||
For more information about using models at scale, see [AUTOTITLE](/github-models/github-models-at-scale/use-models-at-scale).
|
||||
|
||||
## Exceptions to your organization's model settings
|
||||
|
||||
While {% data variables.product.prodname_github_models %} for organizations and repositories is in public preview, some of your organization's model settings are not applied in certain circumstances. Your teams will be able to use AI models without limitation in the following places:
|
||||
|
||||
* {% data variables.product.prodname_emus %} organizations
|
||||
* {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_cli %}
|
||||
* {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_copilot_chat %}
|
||||
* {% data variables.product.prodname_github_models %} VS Code extension
|
||||
* Playground for {% data variables.product.prodname_github_models %} in the {% data variables.product.prodname_marketplace %} at https://github.com/marketplace/models.
|
||||
|
||||
## Enabling {% data variables.product.prodname_github_models %} for an enterprise
|
||||
|
||||
{% data reusables.models.prereq-enable-models-in-enterprise %}
|
||||
|
||||
{% data reusables.enterprise-accounts.access-enterprise %}
|
||||
{% data reusables.enterprise-accounts.policies-tab %}
|
||||
{% data reusables.enterprise-accounts.models-policies %}
|
||||
1. Under "Models", in the "Models in your enterprise" section, click {% octicon "chevron-down" aria-label="the down arrow" %} beside **Disabled** and select **Enabled** from the dropdown.
|
||||
|
||||
## Controlling model usage in your organization
|
||||
|
||||
{% data reusables.models.enterprise-change-models-settings %}
|
||||
|
||||
{% data reusables.models.enable-select-models-in-org %}
|
||||
@@ -0,0 +1,179 @@
|
||||
---
|
||||
title: Using {% data variables.product.prodname_github_models %} to develop AI-powered applications in your enterprise
|
||||
shortTitle: Use Models at scale
|
||||
intro: Streamline AI development in your enterprise.
|
||||
versions:
|
||||
feature: github-models
|
||||
permissions: 'Organization owners and enterprise owners'
|
||||
topics:
|
||||
- Enterprise
|
||||
- AI
|
||||
- GitHub Models
|
||||
allowTitleToDifferFromFilename: true
|
||||
---
|
||||
|
||||
{% data reusables.models.models-preview-note %}
|
||||
|
||||
{% data variables.product.prodname_github_models %} allows your developers to build AI-powered applications at scale while your enterprise maintains control, compliance, and cost efficiency.
|
||||
|
||||
## Why {% data variables.product.prodname_github_models %}?
|
||||
|
||||
* **Centralized model management:** Control which AI models and providers are available to developers across your organization.
|
||||
* **AI development at speed:** Quickly prototype, evaluate, and optimize prompts and models.
|
||||
* **Governance and compliance controls:** Enforce your organization's standards and monitor model usage.
|
||||
* **Cost optimization:** Avoid unexpected costs from high-priced models.
|
||||
* **Collaboration:** Share prompts and results using standard {% data variables.product.github %} development practices.
|
||||
* **Security-focused architecture:** Rest assured that your data remains within {% data variables.product.github %} and Azure and is not shared with model providers.
|
||||
* **Visual interface:** Allow non-technical team members to contribute alongside developers.
|
||||
* **API access:** Use the {% data variables.product.prodname_github_models %} REST API to automate and integrate with enterprise workflows.
|
||||
* **Version control:** All prompt and model changes go through a standard {% data variables.product.github %} commit and pull request flow so you know when and why a prompt changed.
|
||||
|
||||
See [AUTOTITLE](/github-models/about-github-models).
|
||||
|
||||
## Best practices for using {% data variables.product.prodname_github_models %} at scale
|
||||
|
||||
### Compare and evaluate AI models for governance and compliance
|
||||
|
||||
Review and compare available AI models against your company’s governance, data security, and compliance requirements. You can do this in any Models-enabled {% data variables.product.github %} repository or in the {% data variables.product.prodname_github_models %} catalog from the {% data variables.product.prodname_marketplace %} at https://github.com/marketplace?type=models. Your considerations may include:
|
||||
|
||||
* **Governance and security:** Examine each model's compliance with standards and regulations such as GDPR, SOC 2, and ISO 27001, and ensure data is not persisted outside of your organization unless explicitly logged with consent.
|
||||
* **Model performance:** Run benchmark evaluations on your internal datasets to assess reasoning, context retention, and hallucination rates.
|
||||
* **API control and visibility:** Require fine-grained controls over usage quotas, prompt inspection, and rate limits at a team or organization level.
|
||||
* **Cost optimization:** Include token pricing, inference speed, and the availability of model variants for tiered use. For example, you can use cheaper models for test case generation compared to advanced models for architecture discussions.
|
||||
|
||||
Once you have decided which models you want to use, you can limit access in your organization to only those models, see [AUTOTITLE](/github-models/github-models-at-scale/manage-models-at-scale).
|
||||
|
||||
### Optimize and share prompts across teams
|
||||
|
||||
Your developers can use the prompt editor in {% data variables.product.prodname_github_models %} to create and refine prompts. Teams can experiment with different prompt variations and models in a stable, non-production environment that integrates with {% data variables.product.github %} development workflows. The visual interface allows non-technical stakeholders to contribute alongside developers. See [Using the prompt editor](/github-models/use-github-models/prototyping-with-ai-models#using-the-prompt-editor).
|
||||
|
||||
The lightweight evaluation tooling allows your team to compare results across common metrics like latency, relevance, and groundedness, or you can create custom evaluators. Compare prompt and model performance for your specific generative AI use cases, such as creating code, tests, documentation, or code review suggestions.
|
||||
|
||||
As your team creates effective prompts, they can save them as YAML files and share them for review using {% data variables.product.github %} pull requests. Committed prompts are accessible to other teams and workflows and can be kept consistent with your company's standards. This centralized and collaborative approach to prompt management accelerates development and can help you enforce best practices across your organization.
|
||||
|
||||
### Evaluate and optimize model usage costs
|
||||
|
||||
As adoption of your AI-powered application grows and AI models improve, use {% data variables.product.prodname_github_models %} to evaluate the cost and performance of different models and model updates. Select the most cost-effective options for your organization's needs and manage expenses as usage scales across multiple teams.
|
||||
|
||||
### Use the {% data variables.product.prodname_github_models %} REST API or extensions for programmatic management
|
||||
|
||||
To more efficiently manage resources across all teams, you can leverage the {% data variables.product.prodname_github_models %} REST API to:
|
||||
|
||||
* **Manage and update organization settings:** Programmatically update model access permissions and governance settings across multiple teams at once, to ensure consistency and compliance.
|
||||
* **List and retrieve prompts:** List, retrieve, and audit prompts used by different teams, to monitor usage, share successful prompts, and maintain a central repository of best practices.
|
||||
* **Run model inference requests:** Run inference requests for specific models and parameters such as frequency penalty, maximum tokens, response format, and presence penalty.
|
||||
|
||||
You can also use these extensions to run inference requests and manage prompts:
|
||||
* {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_cli %}
|
||||
* {% data variables.product.prodname_github_models %} extension for {% data variables.product.prodname_copilot_chat %}
|
||||
* {% data variables.product.prodname_github_models %} VS Code extension
|
||||
|
||||
### Monitor, iterate, and integrate
|
||||
|
||||
With built-in governance features, you can monitor model usage and ensure ongoing compliance with company policies. Audit logs provide visibility into who accessed or modified models and prompts. The {% data variables.product.prodname_github_models %} repository integration allows all stakeholders to collaborate and continuously iterate on AI-powered applications.
|
||||
|
||||
## Example: Use {% data variables.product.prodname_github_models %} with {% data variables.product.prodname_actions %} to summarize issues
|
||||
|
||||
Large software development projects often contain issues full of technical details. You can roll out AI-powered issue summaries using {% data variables.product.prodname_github_models %} and {% data variables.product.prodname_actions %}.
|
||||
|
||||
**Prerequisite:** Enable {% data variables.product.prodname_github_models %} in your organization, and set the models and publishers you want to make available to individual repositories.
|
||||
|
||||
1. **Create a prompt in a repository**
|
||||
|
||||
In the "Models" tab of a repository, create a prompt using the prompt editor.
|
||||
|
||||
Example system prompt:
|
||||
|
||||
> You are a summarizer of GitHub issues. Emphasize key technical points or important questions.
|
||||
|
||||
Example user prompt:
|
||||
|
||||
> Summarize this issue - {% raw %}{{input}}{% endraw %}
|
||||
|
||||
1. **Run and iterate on your prompt**
|
||||
|
||||
Run your prompt. Provide some sample issue content in the "Variables" pane as the value of `{% raw %}{{input}}{% endraw %}`.
|
||||
|
||||
Try different models (for example, OpenAI GPT-4o) and compare results. Adjust parameters such as max tokens and temperature. Iterate until you are satisfied with the results.
|
||||
|
||||
1. **Optionally, run more extensive tests**
|
||||
|
||||
The "Compare" view allows you to run multiple of your prompt against different models simultaneously and see how the results compare in a grid view. You can also define and use evaluators to ensure that the results contain certain keywords or meet other standards.
|
||||
|
||||
1. **Commit your prompt**
|
||||
|
||||
Name your prompt and commit changes to go through the pull request flow. For example, if you name your prompt `summarize`, you'll get a `summarize.prompt.yaml` file at the root level of your repository that looks something like this:
|
||||
|
||||
```yaml
|
||||
messages:
|
||||
- role: system
|
||||
content: >-
|
||||
You are a summarizer of GitHub issues. Emphasize key technical points or
|
||||
important questions.
|
||||
- role: user
|
||||
content: 'Summarize this issue, please - {% raw %}{{input}}{% endraw %}'
|
||||
model: gpt-4o
|
||||
modelParameters:
|
||||
max_tokens: 4096
|
||||
```
|
||||
|
||||
Once your pull request is reviewed and merged, your prompt will be available for anyone to use in the repository.
|
||||
|
||||
1. **Call your prompt in a workflow**
|
||||
|
||||
For information on creating workflows, see [AUTOTITLE](/actions/writing-workflows).
|
||||
|
||||
You need to set `models: read` permission to allow a prompt to be called in a workflow.
|
||||
|
||||
Here's an example workflow that adds an AI-generated summary as a comment on any newly created issue:
|
||||
|
||||
```yaml copy
|
||||
name: Summarize New Issue
|
||||
|
||||
on:
|
||||
issues:
|
||||
types: [opened]
|
||||
|
||||
permissions:
|
||||
issues: write
|
||||
contents: read
|
||||
models: read
|
||||
|
||||
jobs:
|
||||
summarize_issue:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: {% data reusables.actions.action-checkout %}
|
||||
|
||||
- name: Install gh-models extension
|
||||
run: gh extension install https://github.com/github/gh-models
|
||||
env:
|
||||
GH_TOKEN: ${% raw %}{{ github.token }}{% endraw %}
|
||||
|
||||
- name: Create issue body file
|
||||
run: |
|
||||
cat > issue_body.txt << 'EOT'
|
||||
${% raw %}{{ github.event.issue.body }}{% endraw %}
|
||||
EOT
|
||||
|
||||
- name: Summarize new issue
|
||||
run: |
|
||||
cat issue_body.txt | gh models run --file summarize.prompt.yml > summary.txt
|
||||
env:
|
||||
GH_TOKEN: ${% raw %}{{ github.token }}{% endraw %}
|
||||
|
||||
- name: Update issue with summary
|
||||
run: |
|
||||
SUMMARY=$(cat summary.txt)
|
||||
gh issue comment ${% raw %}{{ github.event.issue.number }}{% endraw %} --body "### Issue Summary
|
||||
${SUMMARY}"
|
||||
env:
|
||||
GH_TOKEN: ${% raw %}{{ github.token }}{% endraw %}
|
||||
```
|
||||
|
||||
1. **Monitor and iterate**
|
||||
|
||||
You can monitor the performance of the action and iterate on the prompt and model selection using the {% data variables.product.prodname_github_models %} prompt editor. You can also use the [CLI extension](https://github.com/github/gh-models) to test locally, or use the {% ifversion fpt %} [{% data variables.product.prodname_github_models %} REST API](/rest/models?apiVersion=2022-11-28) {% else %} API {% endif %}to programmatically update the prompt and model settings.
|
||||
|
||||
You may also want to consider saving the model response as a file in your repository, so that you can review and iterate on the model's performance over time. This allows you to continuously improve the quality of the summaries and ensure they meet your team's needs.
|
||||
@@ -6,7 +6,8 @@ versions:
|
||||
ghes: '*'
|
||||
ghec: '*'
|
||||
children:
|
||||
- /prototyping-with-ai-models
|
||||
- /integrating-ai-models-into-your-development-workflow
|
||||
- /about-github-models
|
||||
- /use-github-models
|
||||
- /github-models-at-scale
|
||||
- /responsible-use-of-github-models
|
||||
---
|
||||
|
||||
264
content/github-models/use-github-models/evaluating-ai-models.md
Normal file
264
content/github-models/use-github-models/evaluating-ai-models.md
Normal file
@@ -0,0 +1,264 @@
|
||||
---
|
||||
title: Evaluating AI models
|
||||
intro: 'Test and compare AI model outputs using evaluators and scoring metrics in GitHub Models.'
|
||||
versions:
|
||||
feature: github-models
|
||||
shortTitle: Evaluate AI models
|
||||
topics:
|
||||
- GitHub Models
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
{% data variables.product.prodname_github_models %} provides a simple evaluation workflow that helps developers compare large language models (LLMs), refine prompts, and make data-driven decisions within the GitHub platform. You can use {% data variables.product.prodname_github_models %} to experiment with new features or validate model changes by analyzing performance, accuracy, and cost through structured evaluation tools.
|
||||
|
||||
## Use cases for {% data variables.product.prodname_github_models %}
|
||||
|
||||
Model behavior can vary widely based on the prompt, input, or configuration. {% data variables.product.prodname_github_models %} helps you:
|
||||
|
||||
* Test and compare multiple LLMs across realistic use cases.
|
||||
* Optimize prompt phrasing, temperature, and other parameters.
|
||||
* Evaluate model outputs using structured, repeatable metrics.
|
||||
* Make AI development integrated into your development workflow.
|
||||
|
||||
## Example scenario
|
||||
|
||||
Consider a scenario where you're building a feature to summarize customer feedback submitted through support tickets. These summaries will be used to generate internal reports and tickets, so the output needs to be clear, relevant, and concise.
|
||||
|
||||
You want to:
|
||||
|
||||
* Experiment with different models and prompt configurations.
|
||||
* Evaluate the best-performing configuration based on quality, consistency, and efficiency.
|
||||
* Save the configuration to your repository for reuse and collaboration.
|
||||
|
||||
## Prompt testing in the Playground
|
||||
|
||||
To familiarize yourself with how to create and manage prompts in {% data variables.product.prodname_github_models %}, see [Testing a prompt in the Playground](/github-models/use-github-models/optimizing-your-ai-powered-app-with-github-models#testing-a-prompt-in-the-playground).
|
||||
|
||||
The Playground lets you compare models side by side, adjust their parameters, and test prompt variations.
|
||||
|
||||
In this step, you'll configure a model to generate summaries for customer support feedback. You'll define a system prompt, test it with a sample input, and refine it to ensure the output is concise and relevant.
|
||||
|
||||
### Defining the system prompt
|
||||
|
||||
Define the model's behavior for your current goal. In this case, the goal is to summarize customer feedback. Under **Parameters**, enter the following System prompt:
|
||||
|
||||
You are a helpful assistant that summarizes support ticket responses into concise summaries.
|
||||
|
||||
Leave the remaining settings set to their defaults.
|
||||
|
||||

|
||||
|
||||
### Writing a user prompt
|
||||
|
||||
Now that the model is set up, enter the following customer feedback into the **Prompt** dialog:
|
||||
|
||||
The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile.
|
||||
|
||||
The model may generate a response like the following:
|
||||
|
||||
The user experiences consistent app crashes when attempting to upload a PDF from their phone. Uploading PDFs works normally on desktop. They request an investigation into the issue.
|
||||
|
||||
### Defining the input variable
|
||||
|
||||
At this point, the configuration generates a clear and concise summary. At the bottom of the **Parameters** settings, click **Create prompt.yml file** to open the Prompt view. Your system prompt will be prefilled automatically.
|
||||
|
||||
In the **User prompt** field, enter the following prompt:
|
||||
|
||||
```text
|
||||
Summarize the following: {% raw %}{{input}}{% endraw %}
|
||||
```
|
||||
|
||||
The {% raw %}`{{input}}`{% endraw %} variable acts as a placeholder that lets you test different inputs (customer feedback) without modifying the prompt each time. Each test input you add will replace {% raw %}`{{input}}`{% endraw %} when the comparison runs.
|
||||
|
||||
## Adding test inputs
|
||||
|
||||
At the top of the Prompts view, select **Compare** to toggle the Comparisons view. This view allows you to run structured comparisons across multiple prompts or models and apply evaluators to measure performance.
|
||||
|
||||

|
||||
|
||||
In the Comparisons view, each row in the table represents a single test case, with a specific input and an expected output. Each column presents a different prompt configuration to compare how various models or prompt styles perform using evaluators.
|
||||
|
||||
Click **Add rows** to enter your test data. The input simulates a real support message, and the expected output represents the ideal summary the model should return. The table below provides sample test inputs and their corresponding expected outputs for evaluation.
|
||||
|
||||
| Row | **Input** | **Expected output** |
|
||||
|-----|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 1 | The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile. | User reports that the mobile app crashes every time they try to upload a PDF, while the desktop version works without issues. |
|
||||
| 2 | I contacted support two days ago and haven't heard back. I need help recovering my account ASAP. | User is awaiting support response and urgently needs account recovery assistance. |
|
||||
| 3 | Please add dark mode. It's very hard to use at night. My eyes hurt after prolonged use. | User requests dark mode due to eye strain during nighttime use. |
|
||||
|
||||
### Adjusting model parameters
|
||||
|
||||
To the right of the table, click {% octicon "plus" aria-label="Add prompt configuration" %} to add a new prompt configuration.
|
||||
|
||||
Within the new prompt configuration, you can update the model and fine-tune its behavior using the available parameters settings. These settings control how the model generates text, including its length, randomness, and repetition.
|
||||
|
||||
### Configuring the model
|
||||
|
||||
From the Model dropdown, choose PHI-4 to create a distinct configuration for comparison.
|
||||
|
||||
You can adjust the following parameters to influence the model’s output:
|
||||
|
||||
* **Max Tokens**: Sets the maximum number of tokens the model can return. Higher values allow longer outputs.
|
||||
* **Temperature**: Controls randomness in the response. Lower values (0.2–0.4) produce more focused, deterministic outputs. Higher values (0.8–1.0) introduce more variation and creativity.
|
||||
* **Top P**: Controls output diversity by selecting from a pool of the most probable next words. Lower values reduce variability, similar to lowering temperature.
|
||||
* **Presence Penalty**: Discourages the model from introducing new topics. Higher values apply a stronger penalty. A value of 0 is typically suitable for summarization.
|
||||
* **Frequency Penalty**: Reduces the likelihood of repeating words. Higher values apply a stronger penalty. A value between 0 and 0.5 helps keep summaries clear and free of redundancy.
|
||||
* **Stop**: Specifies one or more strings that, when generated, will cut off the model's response. Use this to prevent overly long outputs or enforce formatting rules.
|
||||
|
||||
The table below provides a parameter configuration for generating concise summaries during model comparison.
|
||||
|
||||
| Parameter | Value | Reason |
|
||||
|-------------------|------------|-------------------------------------------------------------|
|
||||
| Max Tokens | 128 | Keeps responses short and on-topic |
|
||||
| Temperature | 0.3 | Ensures deterministic, focused outputs |
|
||||
| Top P | 1.0 | Allows full vocabulary but keeps selection guided |
|
||||
| Presence Penalty | 0 | No penalty—summaries don’t need topic variation |
|
||||
| Frequency Penalty | 0.3 | Reduces repeated phrasing in compact summaries |
|
||||
| Stop | (optional) | Use if you want to end output after a keyword or symbol |
|
||||
|
||||
After applying the parameters, you can add additional columns to compare more models or prompt configurations side by side.
|
||||
|
||||
## Evaluating outputs
|
||||
|
||||
Once the prompt is configured, run a structured evaluation to compare model outputs using real data and repeatable metrics.
|
||||
|
||||
Model evaluation helps you understand how different models and prompt configurations perform across real inputs. In the Prompt view, you can apply evaluators to multiple models side by side and review metrics such as similarity, relevance, and groundedness.
|
||||
|
||||
The following evaluators are available:
|
||||
|
||||
* **Similarity**: Measures how closely a model's output matches an expected or reference answer. This is useful when you want to confirm that the model returns consistent and accurate responses aligned with a known result. The score ranges from 0 to 1, with higher values indicating greater similarity.
|
||||
* **Relevance**: Refers to how effectively a response addresses a question. It assesses the accuracy, completeness, and direct relevance of the response based solely on the given information. The score ranges from 0 to 1, with higher values indicating stronger alignment with the input's intent.
|
||||
* **Groundedness**: Measures how well an answer is anchored in the provided context, evaluating its relevance, accuracy, and completeness based exclusively on that context. It assesses the extent to which the answer fully addresses the question without introducing unrelated or incorrect information. The score ranges from 0 to 1, with higher values indicating higher accuracy.
|
||||
* **Custom prompt**: Lets you define your own evaluation criteria for one LLM to assess the output of another. This allows you to score model outputs based on your own guidelines. You can choose between pass/fail or scored evaluations, making it ideal for scenarios where standard metrics do not capture testing expectations.
|
||||
|
||||
When you’re ready to evaluate, click **Run** to generate and compare the outputs across all prompt configurations. After the run completes, {% data variables.product.prodname_github_models %} displays the output from each prompt configuration, along with evaluator scores.
|
||||
|
||||

|
||||
|
||||
## Test case: PDF upload crash
|
||||
|
||||
**Input**: `The app crashes every time I try to upload a PDF from my phone. It works on desktop but not on mobile.`
|
||||
|
||||
The following tables display each model's output and its evaluator scores:
|
||||
|
||||
| Model | Output |
|
||||
|-------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GPT-4.1 | User reports that the app crashes when uploading a PDF from mobile, though uploading works on desktop. |
|
||||
| DeepSeek-R1 | <think> Okay, the user is having trouble with the app crashing when they try to upload a PDF from their phone. It works on the desktop but not mobile. Let me break this down. First, the main issue is the app crashing specifically during PDF uploads on mobile. The fact that it works on desktop suggests it's a mobile-specific problem. Possible causes could be related to the mobile app version, the phone's OS, file size limits on mobile, or compatibility issues with certain PDF formats on mobile devices. I need to make sure the summary mentions the crash during PDF uploads on mobile, note that desktop works, and |
|
||||
| Phi-4 | The app crashes when attempting to upload a PDF from the mobile device, though it functions correctly on the desktop version. |
|
||||
|
||||
| Model | Similarity | Relevance | Groundedness | Input tokens | Output tokens | Latency |
|
||||
|---------------|------------|-----------|--------------|--------------|---------------|---------|
|
||||
| GPT-4.1 | 100% | 50% | 100% | 61 | 20 | 918ms |
|
||||
| DeepSeek-R1 | 50% | 50% | 75% | 52 | 128 | 2285ms |
|
||||
| Phi-4 | 75% | 100% | 100% | 61 | 66 | 1117ms |
|
||||
|
||||
Use the evaluator scores to assess and compare responses beyond surface-level phrasing.
|
||||
|
||||
### Similarity
|
||||
|
||||
Evaluate how closely each model’s output aligns with the expected summary. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Similarity score |
|
||||
|---------------|------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 75% |
|
||||
|
||||
While all models included the key content from the input, DeepSeek-R1’s similarity score is significantly lower due to its verbose internal commentary, which deviate from the expected concise summary format. In contrast, GPT-4.1’s response matches the phrasing and structure of the reference output.
|
||||
|
||||
### Relevance
|
||||
|
||||
Evaluate how well each model captures the core intent of the input. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Relevance score |
|
||||
|---------------|-----------------|
|
||||
| GPT-4.1 | 50% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
All three models recognized the key issue of the app crashing during PDF uploads on mobile. Phi-4 earned a higher relevance score for reflecting the user's perspective more completely. DeepSeek-R1 lost points for introducing speculative technical causes not mentioned in the original input.
|
||||
|
||||
### Groundedness
|
||||
|
||||
Evaluate whether each model’s output stays true to the input without introducing unsupported information.
|
||||
The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Groundedness score |
|
||||
|---------------|--------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 75% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
Even though DeepSeek-R1 adds internal commentary, it does not introduce hallucinated facts. Its final summary sentence correctly reflects the original input.
|
||||
|
||||
## Test case: Dark mode request
|
||||
|
||||
**Input**: `Please add dark mode. It's very hard to use at night. My eyes hurt after prolonged use.`
|
||||
|
||||
The following tables display each model's output and its evaluator scores:
|
||||
|
||||
| Model | Output |
|
||||
|-------|---------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| GPT-4.1 | User requests the addition of a dark mode feature due to discomfort and eye strain when using the app at night. |
|
||||
| DeepSeek-R1 | <think> Okay, the user wants dark mode added because it's hard to use at night and their eyes hurt. I need to make a concise summary. Let me start with the main request: adding dark mode. Then mention the reason: eye strain from prolonged use at night. Keep it to one sentence. Maybe structure it as "Request to implement dark mode to reduce eye strain during nighttime use." That covers the key points without extra details. Check if that's under the character limit. Yep, looks good. </think> Summary: Request to implement dark mode to reduce eye strain caused by prolonged nighttime use. |
|
||||
| Phi-4 | The customer is requesting the addition of a dark mode feature to reduce eye strain when using the product at night. |
|
||||
|
||||
| Model | Similarity | Relevance | Groundedness | Input Tokens | Output Tokens | Latency |
|
||||
|---------------|------------|-----------|--------------|---------------|----------------|---------|
|
||||
| GPT-4.1 | 100% | 75% | 100% | 57 | 18 | 1286ms |
|
||||
| DeepSeek-R1 | 50% | 0% | 25% | 49 | 128 | 1946ms |
|
||||
| Phi-4 | 100% | 75% | 100% | 58 | 20 | 899ms |
|
||||
|
||||
### Similarity
|
||||
|
||||
Evaluate how closely each model’s output aligns with the expected summary. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Similarity score |
|
||||
|---------------|------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 50% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
While all models included the key content from the input, DeepSeek-R1’s similarity score is again, significantly lower due to its verbose internal commentary.
|
||||
|
||||
### Relevance
|
||||
|
||||
Evaluate how well each model captures the core intent of the input. The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Relevance score |
|
||||
|---------------|-----------------|
|
||||
| GPT-4.1 | 75% |
|
||||
| DeepSeek-R1 | 0% |
|
||||
| Phi-4 | 75% |
|
||||
|
||||
GPT-4.1 and Phi-4 both captured the main intent of the user’s request: the need for dark mode to reduce eye strain and improve usability at night. DeepSeek-R1 scored 0% in relevance due to its verbose internal commentary that distracted from the actual output.
|
||||
|
||||
### Groundedness
|
||||
|
||||
Evaluate whether each model’s output stays true to the input without introducing unsupported information.
|
||||
The table below shows the relevance scores for each model.
|
||||
|
||||
| Model | Groundedness score |
|
||||
|---------------|--------------------|
|
||||
| GPT-4.1 | 100% |
|
||||
| DeepSeek-R1 | 25% |
|
||||
| Phi-4 | 100% |
|
||||
|
||||
DeepSeek-R1 scored lower due to its verbose `<think>` block, which included speculative reasoning not present in the original input.
|
||||
|
||||
## Save the configuration
|
||||
|
||||
After completing your evaluations, the final step is to choose the model that performs best for your specific use case. In the examples above, Phi-4 and GPT-4.1 delivered strong, consistent results across all evaluators. DeepSeek-R1 scored lower due to its verbose reasoning and less focused outputs.
|
||||
|
||||
Once you've selected your preferred model and prompt configuration, add a descriptive name to the prompt file, then click **Commit changes**. This will save the model, prompt, parameter settings, and associated dataset as a reusable configuration file in your repository.
|
||||
|
||||

|
||||
|
||||
Committing your prompt configuration makes it easy to reuse, collaborate, and iterate across model setups. It makes it easier to rerun evaluations and track the performance of the prompt configuration over time.
|
||||
|
||||
## Further reading
|
||||
|
||||
* [AUTOTITLE](/github-models/use-github-models/storing-prompts-in-github-repositories)
|
||||
* [AUTOTITLE](/github-models/use-github-models/integrating-ai-models-into-your-development-workflow)
|
||||
12
content/github-models/use-github-models/index.md
Normal file
12
content/github-models/use-github-models/index.md
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
title: Use GitHub Models
|
||||
intro: '{% data variables.product.prodname_github_models %} helps you go from prompt to production by testing, comparing, evaluating, and integrating AI directly in your repository.'
|
||||
versions:
|
||||
feature: github-models
|
||||
children:
|
||||
- /prototyping-with-ai-models
|
||||
- /optimizing-your-ai-powered-app-with-github-models
|
||||
- /evaluating-ai-models
|
||||
- /storing-prompts-in-github-repositories
|
||||
- /integrating-ai-models-into-your-development-workflow
|
||||
---
|
||||
@@ -36,7 +36,7 @@ You can use the {% data variables.product.prodname_actions %} token (`GITHUB_TOK
|
||||
|
||||
### Setting permissions
|
||||
|
||||
To use AI models in your workflows, you need to ensure that the `models` permission is enabled in your workflow configuration. This permission allows workflows to access the {% data variables.product.prodname_github_models %} inference API. You can either set this permission itself or use the general `read-all` or `write-all` permissions. See [AUTOTITLE](/rest/overview/permissions-required-for-github-apps?apiVersion=2022-11-28#repository-permissions-for-actions).
|
||||
To use AI models in your workflows, ensure that the `models` permission is enabled in your workflow configuration. This permission allows workflows to access the {% data variables.product.prodname_github_models %} inference API. You can either set this permission itself or use the general `read-all` or `write-all` permissions. See [AUTOTITLE](/rest/overview/permissions-required-for-github-apps?apiVersion=2022-11-28#repository-permissions-for-actions).
|
||||
|
||||
### Writing your workflow file
|
||||
|
||||
@@ -81,7 +81,7 @@ You can use the {% data variables.product.prodname_github_models %} extension fo
|
||||
|
||||
### Prerequisites
|
||||
|
||||
To use the {% data variables.product.prodname_github_models %} CLI extension, you need to have {% data variables.product.prodname_cli %} installed. {% data reusables.cli.cli-installation %}
|
||||
To use the {% data variables.product.prodname_github_models %} CLI extension, you must install {% data variables.product.prodname_cli %}. {% data reusables.cli.cli-installation %}
|
||||
|
||||
### Installing the extension
|
||||
|
||||
@@ -0,0 +1,173 @@
|
||||
---
|
||||
title: Optimizing your AI-powered app with Models
|
||||
shortTitle: Optimize your AI-powered app
|
||||
intro: 'Learn how to test models and refine prompts for your AI-powered application.'
|
||||
versions:
|
||||
feature: github-models
|
||||
allowTitleToDifferFromFilename: true
|
||||
---
|
||||
|
||||
With new AI models being released regularly, choosing the right one for your application can be challenging. {% data variables.product.prodname_github_models %} helps you optimize your AI-powered application by letting you **compare different models and prompt variations** against sample inputs, while using **built-in evaluators** to validate model output.
|
||||
|
||||
Through an example scenario, we'll build an AI-powered assistant that helps users learn how to use Git from the command line. We'll walk through comparing different models, and you'll learn how to refine prompt variations to enhance output quality.
|
||||
|
||||
>[!NOTE]
|
||||
>
|
||||
> * {% data variables.product.prodname_github_models %} is in {% data variables.release-phases.public_preview %} and subject to change.
|
||||
> * Usage is rate limited. See [AUTOTITLE](/github-models/use-github-models/prototyping-with-ai-models#rate-limits).
|
||||
|
||||
## Testing a prompt
|
||||
|
||||
The {% data variables.product.prodname_github_models %} **Comparisons** view allows you to adjust model parameters and prompts to test model output.
|
||||
|
||||
### 1. Creating a sample repository
|
||||
|
||||
You can access the **Comparisons** view directly from the **Models** tab in any repository, but in this guide we'll create a new repository to use as a test environment.
|
||||
|
||||
1. Navigate to the [new repository](https://github.com/new) page.
|
||||
1. Under "Owner", make sure your user account is selected.
|
||||
1. In the "Repository name" field, type `models-playground`.
|
||||
1. Beneath the description field, select **Private** to set the repository visibility.
|
||||
1. Click **Create repository**.
|
||||
|
||||
### 2. Creating a new prompt
|
||||
|
||||
1. On the main page of your new repository, click the **Models** tab.
|
||||
1. In the "Prompts" section, click **{% octicon "plus" aria-hidden="true" %} New Prompt**.
|
||||
1. In the upper-left corner, choose a model from the dropdown menu.
|
||||
|
||||
### 3. Writing a system prompt
|
||||
|
||||
A system prompt is a set of instructions that defines the role, behavior, and limitations of an AI model before it interacts with users. In this example, we'll be working on an AI-powered application that explains how to use Git from the command line.
|
||||
|
||||
In the **System prompt** field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
You are an expert at using the Git version control system. I will ask questions looking for guidance on the best way to perform tasks using Git, and you will give clear, step-by-step answers that explain each step you are recommending.
|
||||
```
|
||||
|
||||
> [!NOTE] If the **System** text field is not editable, try choosing a different model in the model dropdown above. Not all models allow the system prompt to be modified.
|
||||
|
||||
### 4. Writing a user prompt
|
||||
|
||||
The user prompt is the direct question or instruction to an AI system during their conversation, which the AI system responds to.
|
||||
|
||||
In the **User prompt** field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line. {{input}}
|
||||
```
|
||||
|
||||
### 5. Entering sample input
|
||||
|
||||
The {% raw %}`{{input}}`{% endraw %} variable in the **User prompt** functions as a placeholder for sample input. To manage this placeholder, click the **Variables** button and enter the following text:
|
||||
|
||||
```text copy
|
||||
When should I use rebase or merge?
|
||||
```
|
||||
|
||||
### 6. Running the sample prompt
|
||||
|
||||
1. In the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Play**.
|
||||
1. Make a change to the model or prompt, then run the prompt again to see what results you get.
|
||||
|
||||
## Testing different models against a prompt
|
||||
|
||||
Now, let's determine which model will work best for our application, using the **Comparisons** view. This view allows you to test different models on the same input, revealing differences in accuracy, creativity, tone, reasoning, and reliability. That will help us choose the model that best fits our needs in terms of quality, speed, cost, and consistency.
|
||||
|
||||
1. In the upper-left corner, click **Comparisons**.
|
||||
1. To compare different models, click **{% octicon "plus" aria-hidden="true" %} Add prompt** and select **{% octicon "repo-forked" aria-hidden="true" %} Copy original prompt** to duplicate your existing system and user prompts. Create at least two copies of the original prompt to evaluate three different models.
|
||||
1. Next to each prompt, click {% octicon "pencil" aria-hidden="Edit prompt" %}. From the **Model** dropdown, choose a different model for each prompt you created.
|
||||
1. Click **{% octicon "plus" aria-hidden="true" %} Add inputs** to create new rows for sample inputs.
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs**. Then, in the "Input" field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
How do I modify the most recent commit message in my current branch?
|
||||
```
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs** again, then paste the following input:
|
||||
|
||||
```text copy
|
||||
How do I move a specific commit from one branch to a different branch?
|
||||
```
|
||||
|
||||
* Click **{% octicon "plus" aria-hidden="true" %} Add inputs** one more time, and paste this input:
|
||||
|
||||
```text copy
|
||||
How do I find the author of a specific commit in a repository's history?
|
||||
```
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. Try out different models against your prompt, making note of the **Latency** and the **Input** and **Output** token usage of the different models.
|
||||
|
||||
## Testing prompt variations with a specific model
|
||||
|
||||
If you’re building an application with a specific AI model, you need responses to be predictable and reliable. Testing prompt variations helps you to:
|
||||
|
||||
* **Optimize performance and quality**: Slight changes in phrasing can affect the response quality. By testing variations you can find the wording that gets the best response.
|
||||
* **Clarify instructions**: By varying the phrasing of your prompt, you can identify which version the model understands most clearly.
|
||||
* **Adapt to specific model behavior**: You can tailor your input to how a specific model interprets language.
|
||||
* **Verify the format of the output**: You may want a list, a paragraph, a code block, or a specific tone. Testing prompt variation helps you enforce a specific structure or style.
|
||||
|
||||
Now, let's use {% data variables.product.prodname_github_models %} to test prompt variations against user input for your specific model.
|
||||
|
||||
### 1. Adding prompt variations
|
||||
|
||||
For this example scenario, select the **same model for each column**, but provide a different prompt variation by editing the "User prompt" field of our existing prompts. Next to each prompt, click {% octicon "pencil" aria-hidden="Edit prompt" %}.
|
||||
|
||||
1. Click **{% octicon "pencil" aria-hidden="Edit prompt" %}** in "Prompt 2." Then, in the "User prompt" field, copy and paste the following text:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line, but explain it to me like I am five years old. {{input}}
|
||||
```
|
||||
|
||||
1. Click **{% octicon "pencil" aria-hidden="Edit prompt" %}** in "Prompt 3", then paste the following input:
|
||||
|
||||
```text copy
|
||||
I want to learn how to use Git from the command line. Give me instructions in the form of a haiku. {{input}}
|
||||
```
|
||||
|
||||
### 2. Running prompt variations
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. Try out different prompt variations, and compare the types of output the model gives you.
|
||||
|
||||
## Evaluating model output
|
||||
|
||||
We’ve tested different models and prompt variations in {% data variables.product.prodname_github_models %}, and the next step is interpreting and comparing the results to make informed decisions for our AI-powered application.
|
||||
|
||||
As you ran the models in the example scenarios, the **Input** and **Output** token usage and **Latency** were displayed after each run. Token usage matters because it directly affects **cost, performance, and model limitations.**
|
||||
|
||||
* Since most models charge per token both for input and output, using more tokens increases your cost.
|
||||
* Each model also has a maximum token limit (called a context window), and exceeding it can result in errors or truncated responses.
|
||||
* Longer prompts can slow down response time or reduce clarity, while concise prompts often lead to better, more efficient outputs.
|
||||
|
||||
Using {% data variables.product.prodname_github_models %} to test token usage and latency helps you stay within limits, manage costs, and improve overall effectiveness of your AI-powered application.
|
||||
|
||||
### Using evaluators to judge output
|
||||
|
||||
Depending on the number of prompts and models you are evaluating, the model output can be overwhelming to sort through manually. To help assess the quality of each model's output, you can use **Evaluators** to score results across key dimensions like clarity, accuracy, and relevance. You can define your own evaluation criteria, or use built-in evaluators to automatically rate outputs, making it easier to identify the best-performing model and prompt variation.
|
||||
|
||||
For this example scenario, let's use the **String check** evaluator to check for a string in the output.
|
||||
|
||||
1. In the lower-right corner of the **Prompts** field, click **{% octicon "plus" aria-hidden="true" %} Add evaluator** and select **{% octicon "note" aria-hidden="true" %} String check**.
|
||||
1. In the **Name** field, enter "Amend check", then copy and paste the following input for the **Value** field:
|
||||
|
||||
```text copy
|
||||
git commit --amend
|
||||
```
|
||||
|
||||
1. To run the prompts, in the upper-right corner, click **{% octicon "play" aria-hidden="true" %} Run**.
|
||||
1. The prompt output will now show a **Pass** or **Fail** label, letting you know which model contained the required string.
|
||||
|
||||
To learn more about the pre-built evaluators like similarity, groundedness, and relevance, see [AUTOTITLE](/github-models/use-github-models/evaluating-ai-models#evaluating-outputs).
|
||||
|
||||
## Next steps
|
||||
|
||||
Now that you've explored how to test models, refine prompts, and evaluate outputs using {% data variables.product.prodname_github_models %}, you're ready to start building prompts for your AI-powered application. After creating a prompt, you can store, version, and share it by committing a `.prompt.yml` file to your repository. This keeps your prompts under version control, and enables easy collaboration on prompt and model refinement. For more information, see [AUTOTITLE](/github-models/use-github-models/storing-prompts-in-github-repositories).
|
||||
|
||||
## Join the community
|
||||
|
||||
To ask questions and share feedback, see this [GitHub Models discussion post](https://github.com/orgs/community/discussions/159087).
|
||||
To learn how others are using {% data variables.product.prodname_github_models %}, visit the [GitHub Community discussions for Models](https://github.com/orgs/community/discussions/categories/models).
|
||||
@@ -4,6 +4,8 @@ shortTitle: Prototype with AI models
|
||||
intro: 'Find and experiment with AI models for free.'
|
||||
versions:
|
||||
feature: github-models
|
||||
redirect_from:
|
||||
- /github-models/prototyping-with-ai-models
|
||||
---
|
||||
|
||||
If you want to develop a generative AI application, you can use {% data variables.product.prodname_github_models %} to find and experiment with AI models for free. Once you are ready to bring your application to production, you can switch to a token from a paid Azure account. See the [Azure AI](https://aka.ms/azureai/github-models) documentation.
|
||||
@@ -43,6 +45,14 @@ With one model open in the playground, click **Compare**, then, in the dropdown
|
||||
|
||||
Any parameters you set are used for both models.
|
||||
|
||||
## Evaluating AI models
|
||||
|
||||
Once you've started testing prompts in the playground, you can evaluate model performance using structured metrics. Evaluations help you compare multiple prompt configurations across different models and determine which setup performs best.
|
||||
|
||||
In the Comparisons view, you can apply evaluators like similarity, relevance, and groundedness to measure how well each output meets your expectations. You can also define your own evaluation criteria with a custom prompt evaluator.
|
||||
|
||||
For step-by-step instructions, see [Evaluating outputs](/github-models/use-github-models/evaluating-ai-models#evaluating-outputs).
|
||||
|
||||
## Experimenting with AI models using the API
|
||||
|
||||
>[!NOTE]
|
||||
@@ -61,7 +71,7 @@ The steps to use each model are similar. In general, you will need to:
|
||||
1. Optionally, use the language dropdown to select the programming language.
|
||||
1. Optionally, use the SDK dropdown to select which SDK to use.
|
||||
|
||||
All models can be used with the Azure AI Inference SDK, and some models support additional SDKs. If you want to easily switch between models, you should select "Azure AI Inference SDK". If you selected "REST" as the language, you won't use an SDK. Instead, you will use the API endpoint directly.
|
||||
All models can be used with the Azure AI Inference SDK, and some models support additional SDKs. If you want to easily switch between models, you should select "Azure AI Inference SDK". If you selected "REST" as the language, you won't use an SDK. Instead, you will use the API endpoint directly. {% ifversion fpt %} See [{% data variables.product.prodname_github_models %} REST API](/rest/models?apiVersion=2022-11-28). {% endif %}
|
||||
1. Either open a codespace, or set up your local environment:
|
||||
* To run in a codespace, click **{% octicon "codespaces" aria-hidden="true" %} Run codespace**, then click **Create new codespace**.
|
||||
* To run locally:
|
||||
@@ -347,4 +357,5 @@ These limits are subject to change without notice.
|
||||
|
||||
## Leaving feedback
|
||||
|
||||
To leave feedback about {% data variables.product.prodname_github_models %}, start a new discussion or comment on an existing discussion in the [GitHub Community](https://github.com/orgs/community/discussions/categories/models).
|
||||
To ask questions and share feedback, see this [GitHub Models discussion post](https://github.com/orgs/community/discussions/159087).
|
||||
To learn how others are using {% data variables.product.prodname_github_models %}, visit the [GitHub Community discussions for Models](https://github.com/orgs/community/discussions/categories/models).
|
||||
@@ -0,0 +1,60 @@
|
||||
---
|
||||
title: Storing prompts in GitHub repositories
|
||||
shortTitle: Store prompts
|
||||
intro: 'Store prompts directly in your GitHub repositories to leverage automated text summarization and other AI-driven functionalities.'
|
||||
allowTitleToDifferFromFilename: true
|
||||
versions:
|
||||
fpt: '*'
|
||||
ghec: '*'
|
||||
---
|
||||
|
||||
Prompts can be stored as files directly within GitHub repositories. This unlocks the ability to view your prompts in an organized UI, share them with non-technical stakeholders, and run seamless iterations and comparisons on adjustments to models and prompts.
|
||||
|
||||
## Benefits
|
||||
|
||||
* Easy integration with the new suite of AI development tools directly on GitHub.
|
||||
* Simple and scalable from simple to complex use cases.
|
||||
* Uses a widely supported format, compatible with existing tools.
|
||||
|
||||
## Supported file format
|
||||
|
||||
Store prompts in markdown files with optional YAML front matter.
|
||||
|
||||
The file can be located anywhere in your repository, but it **must have the extension `.prompt.md`**.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
---
|
||||
name: Summarizer
|
||||
description: Summarizes a given text
|
||||
model: openai/gpt-4o
|
||||
model_parameters:
|
||||
temperature: 0.5
|
||||
---
|
||||
system:
|
||||
You are a text summarizer. Your only job is to summarize a given text to you.
|
||||
user:
|
||||
Summarize the given text:
|
||||
<text>
|
||||
{% raw %}{{text}}{% endraw %}
|
||||
</text>
|
||||
```
|
||||
|
||||
## Prompt structure
|
||||
|
||||
Prompts have two key parts:
|
||||
|
||||
* **Runtime information** (required)
|
||||
* Prompt templates (system, user, etc.) using simple {{variable}} placeholders
|
||||
* **Development information** (optional)
|
||||
* Human-readable name and description
|
||||
* Model identifier and parameters
|
||||
* Sample data for testing and evaluations
|
||||
|
||||
## Limitations
|
||||
|
||||
You cannot store prompts for:
|
||||
|
||||
* Complex templating languages
|
||||
* Proprietary or complex file formats (such as `.ghprompt`, or `.prompty`)
|
||||
Reference in New Issue
Block a user