Files
dify/e2e/AGENTS.md

10 KiB

E2E

This package contains the repository-level end-to-end tests for Dify.

This file is the canonical package guide for e2e/. Keep detailed workflow, architecture, debugging, and reporting documentation here. Keep README.md as a minimal pointer to this file so the two documents do not drift.

The suite uses Cucumber for scenario definitions and Playwright as the browser execution layer.

It tests:

  • backend API started from source
  • frontend served from the production artifact
  • middleware services started from Docker

Prerequisites

  • Node.js ^22.22.1
  • pnpm
  • uv
  • Docker

Run the following commands from the repository root.

Install Playwright browsers once:

pnpm install
pnpm -C e2e e2e:install
pnpm -C e2e check

pnpm install is resolved through the repository workspace and uses the shared root lockfile plus pnpm-workspace.yaml.

Use pnpm check as the default local verification step after editing E2E TypeScript, Cucumber support code, or feature glue. It runs formatting, linting, and type checks for this package.

Common commands:

# authenticated-only regression (default excludes @fresh)
# expects backend API, frontend artifact, and middleware stack to already be running
pnpm -C e2e e2e

# full reset + fresh install + authenticated scenarios
# starts required middleware/dependencies for you
pnpm -C e2e e2e:full

# run a tagged subset
pnpm -C e2e e2e -- --tags @smoke

# headed browser
pnpm -C e2e e2e:headed -- --tags @smoke

# slow down browser actions for local debugging
E2E_SLOW_MO=500 pnpm -C e2e e2e:headed -- --tags @smoke

Frontend artifact behavior:

  • if web/.next/BUILD_ID exists, E2E reuses the existing build by default
  • if you set E2E_FORCE_WEB_BUILD=1, E2E rebuilds the frontend before starting it

Lifecycle

flowchart TD
  A["Start E2E run"] --> B["run-cucumber.ts orchestrates setup/API/frontend"]
  B --> C["support/web-server.ts starts or reuses frontend directly"]
  C --> D["Cucumber loads config, steps, and support modules"]
  D --> E["BeforeAll bootstraps shared auth state via /install"]
  E --> F{"Which command is running?"}
  F -->|`pnpm e2e`| G["Run config default tags: not @fresh and not @skip"]
  F -->|`pnpm e2e:full*`| H["Override tags to not @skip"]
  G --> I["Per-scenario BrowserContext from shared browser"]
  H --> I
  I --> J["Failure artifacts written to cucumber-report/artifacts"]

Ownership is split like this:

  • scripts/setup.ts is the single environment entrypoint for reset, middleware, backend, and frontend startup
  • run-cucumber.ts orchestrates the E2E run and Cucumber invocation
  • support/web-server.ts manages frontend reuse, startup, readiness, and shutdown
  • features/support/hooks.ts manages auth bootstrap, scenario lifecycle, and diagnostics
  • features/support/world.ts owns per-scenario typed context
  • features/step-definitions/ holds domain-oriented glue so the official VS Code Cucumber plugin works with default conventions when e2e/ is opened as the workspace root

Package layout:

  • features/: Gherkin scenarios grouped by capability
  • features/step-definitions/: domain-oriented step definitions
  • features/support/hooks.ts: suite lifecycle, auth-state bootstrap, diagnostics
  • features/support/world.ts: shared scenario context
  • support/web-server.ts: typed frontend startup/reuse logic
  • scripts/setup.ts: reset and service lifecycle commands
  • scripts/run-cucumber.ts: Cucumber orchestration entrypoint

Behavior depends on instance state:

  • uninitialized instance: completes install and stores authenticated state
  • initialized instance: signs in and reuses authenticated state

Because of that, the @fresh install scenario only runs in the pnpm e2e:full* flows. The default pnpm e2e* flows exclude @fresh via Cucumber config tags so they can be re-run against an already initialized instance.

Reset all persisted E2E state:

pnpm -C e2e e2e:reset

This removes:

  • docker/volumes/db/data
  • docker/volumes/redis/data
  • docker/volumes/weaviate
  • docker/volumes/plugin_daemon
  • e2e/.auth
  • e2e/.logs
  • e2e/cucumber-report

Start the full middleware stack:

pnpm -C e2e e2e:middleware:up

Stop the full middleware stack:

pnpm e2e:middleware:down

The middleware stack includes:

  • PostgreSQL
  • Redis
  • Weaviate
  • Sandbox
  • SSRF proxy
  • Plugin daemon

Fresh install verification:

pnpm e2e:full

Run the Cucumber suite against an already running middleware stack:

pnpm e2e:middleware:up
pnpm e2e
pnpm e2e:middleware:down

Artifacts and diagnostics:

  • cucumber-report/report.html: HTML report
  • cucumber-report/report.json: JSON report
  • cucumber-report/artifacts/: failure screenshots and HTML captures
  • .logs/cucumber-api.log: backend startup log
  • .logs/cucumber-web.log: frontend startup log

Open the HTML report locally with:

open cucumber-report/report.html

Writing new scenarios

Workflow

  1. Create a .feature file under features/<capability>/
  2. Add step definitions under features/step-definitions/<capability>/
  3. Reuse existing steps from common/ and other definition files before writing new ones
  4. Run with pnpm -C e2e e2e -- --tags @your-tag to verify
  5. Run pnpm -C e2e check before committing

Feature file conventions

Tag every feature or scenario with a capability tag. Add auth tags only when they clarify intent or change the browser session behavior:

@datasets @authenticated
Feature: Create dataset
  Scenario: Create a new empty dataset
    Given I am signed in as the default E2E admin
    When I open the datasets page
    ...
  • Capability tags (@apps, @auth, @datasets, …) group related scenarios for selective runs
  • Auth/session tags:
    • default behavior — scenarios run with the shared authenticated storageState unless marked otherwise
    • @unauthenticated — uses a clean BrowserContext with no cookies or storage
    • @authenticated — optional intent tag for readability or selective runs; it does not currently change hook behavior on its own
  • @fresh — only runs in e2e:full mode (requires uninitialized instance)
  • @skip — excluded from all runs

Keep scenarios short and declarative. Each step should describe what the user does, not how the UI works.

Step definition conventions

import { When, Then } from '@cucumber/cucumber'
import { expect } from '@playwright/test'
import type { DifyWorld } from '../../support/world'

When('I open the datasets page', async function (this: DifyWorld) {
  await this.getPage().goto('/datasets')
})

Rules:

  • Always type this as DifyWorld for proper context access
  • Use async function (not arrow functions — Cucumber binds this)
  • One step = one user-visible action or one assertion
  • Keep steps stateless across scenarios; use DifyWorld properties for in-scenario state

Locator priority

Follow the Playwright recommended locator strategy, in order of preference:

Priority Locator Example When to use
1 getByRole getByRole('button', { name: 'Create' }) Default choice — accessible and resilient
2 getByLabel getByLabel('App name') Form inputs with visible labels
3 getByPlaceholder getByPlaceholder('Enter name') Inputs without visible labels
4 getByText getByText('Welcome') Static text content
5 getByTestId getByTestId('workflow-canvas') Only when no semantic locator works

Avoid raw CSS/XPath selectors. They break when the DOM structure changes.

Assertions

Use @playwright/test expect — it auto-waits and retries until the condition is met or the timeout expires:

// URL assertion
await expect(page).toHaveURL(/\/datasets\/[a-f0-9-]+\/documents/)

// Element visibility
await expect(page.getByRole('button', { name: 'Save' })).toBeVisible()

// Element state
await expect(page.getByRole('button', { name: 'Submit' })).toBeEnabled()

// Negation
await expect(page.getByText('Loading')).not.toBeVisible()

Do not use manual waitForTimeout or polling loops. If you need a longer wait for a specific assertion, pass { timeout: 30_000 } to the assertion.

Cucumber expressions

Use Cucumber expression parameter types to extract values from Gherkin steps:

Type Pattern Example step
{string} Quoted string I select the "Workflow" app type
{int} Integer I should see {int} items
{float} Decimal the progress is {float} percent
{word} Single word I click the {word} tab

Prefer {string} for UI labels, names, and text content — it maps naturally to Gherkin's quoted values.

Scoping locators

When the page has multiple similar elements, scope locators to a container:

When('I fill in the app name in the dialog', async function (this: DifyWorld) {
  const dialog = this.getPage().getByRole('dialog')
  await dialog.getByPlaceholder('Give your app a name').fill('My App')
})

Failure diagnostics

The After hook automatically captures on failure:

  • Full-page screenshot (PNG)
  • Page HTML dump
  • Console errors and page errors

Artifacts are saved to cucumber-report/artifacts/ and attached to the HTML report. No extra code needed in step definitions.

Reusing existing steps

Before writing a new step definition, inspect the existing step definition files first. Reuse a matching step when the wording and behavior already fit, and only add a new step when the scenario needs a genuinely new user action or assertion. Steps in common/ are designed for broad reuse across all features.

Or browse the step definition files directly:

  • features/step-definitions/common/ — auth guards and navigation assertions shared by all features
  • features/step-definitions/<capability>/ — domain-specific steps scoped to a single feature area