10 KiB
E2E
This package contains the repository-level end-to-end tests for Dify.
This file is the canonical package guide for e2e/. Keep detailed workflow, architecture, debugging, and reporting documentation here. Keep README.md as a minimal pointer to this file so the two documents do not drift.
The suite uses Cucumber for scenario definitions and Playwright as the browser execution layer.
It tests:
- backend API started from source
- frontend served from the production artifact
- middleware services started from Docker
Prerequisites
- Node.js
^22.22.1 pnpmuv- Docker
Run the following commands from the repository root.
Install Playwright browsers once:
pnpm install
pnpm -C e2e e2e:install
pnpm -C e2e check
pnpm install is resolved through the repository workspace and uses the shared root lockfile plus pnpm-workspace.yaml.
Use pnpm check as the default local verification step after editing E2E TypeScript, Cucumber support code, or feature glue. It runs formatting, linting, and type checks for this package.
Common commands:
# authenticated-only regression (default excludes @fresh)
# expects backend API, frontend artifact, and middleware stack to already be running
pnpm -C e2e e2e
# full reset + fresh install + authenticated scenarios
# starts required middleware/dependencies for you
pnpm -C e2e e2e:full
# run a tagged subset
pnpm -C e2e e2e -- --tags @smoke
# headed browser
pnpm -C e2e e2e:headed -- --tags @smoke
# slow down browser actions for local debugging
E2E_SLOW_MO=500 pnpm -C e2e e2e:headed -- --tags @smoke
Frontend artifact behavior:
- if
web/.next/BUILD_IDexists, E2E reuses the existing build by default - if you set
E2E_FORCE_WEB_BUILD=1, E2E rebuilds the frontend before starting it
Lifecycle
flowchart TD
A["Start E2E run"] --> B["run-cucumber.ts orchestrates setup/API/frontend"]
B --> C["support/web-server.ts starts or reuses frontend directly"]
C --> D["Cucumber loads config, steps, and support modules"]
D --> E["BeforeAll bootstraps shared auth state via /install"]
E --> F{"Which command is running?"}
F -->|`pnpm e2e`| G["Run config default tags: not @fresh and not @skip"]
F -->|`pnpm e2e:full*`| H["Override tags to not @skip"]
G --> I["Per-scenario BrowserContext from shared browser"]
H --> I
I --> J["Failure artifacts written to cucumber-report/artifacts"]
Ownership is split like this:
scripts/setup.tsis the single environment entrypoint for reset, middleware, backend, and frontend startuprun-cucumber.tsorchestrates the E2E run and Cucumber invocationsupport/web-server.tsmanages frontend reuse, startup, readiness, and shutdownfeatures/support/hooks.tsmanages auth bootstrap, scenario lifecycle, and diagnosticsfeatures/support/world.tsowns per-scenario typed contextfeatures/step-definitions/holds domain-oriented glue so the official VS Code Cucumber plugin works with default conventions whene2e/is opened as the workspace root
Package layout:
features/: Gherkin scenarios grouped by capabilityfeatures/step-definitions/: domain-oriented step definitionsfeatures/support/hooks.ts: suite lifecycle, auth-state bootstrap, diagnosticsfeatures/support/world.ts: shared scenario contextsupport/web-server.ts: typed frontend startup/reuse logicscripts/setup.ts: reset and service lifecycle commandsscripts/run-cucumber.ts: Cucumber orchestration entrypoint
Behavior depends on instance state:
- uninitialized instance: completes install and stores authenticated state
- initialized instance: signs in and reuses authenticated state
Because of that, the @fresh install scenario only runs in the pnpm e2e:full* flows. The default pnpm e2e* flows exclude @fresh via Cucumber config tags so they can be re-run against an already initialized instance.
Reset all persisted E2E state:
pnpm -C e2e e2e:reset
This removes:
docker/volumes/db/datadocker/volumes/redis/datadocker/volumes/weaviatedocker/volumes/plugin_daemone2e/.authe2e/.logse2e/cucumber-report
Start the full middleware stack:
pnpm -C e2e e2e:middleware:up
Stop the full middleware stack:
pnpm e2e:middleware:down
The middleware stack includes:
- PostgreSQL
- Redis
- Weaviate
- Sandbox
- SSRF proxy
- Plugin daemon
Fresh install verification:
pnpm e2e:full
Run the Cucumber suite against an already running middleware stack:
pnpm e2e:middleware:up
pnpm e2e
pnpm e2e:middleware:down
Artifacts and diagnostics:
cucumber-report/report.html: HTML reportcucumber-report/report.json: JSON reportcucumber-report/artifacts/: failure screenshots and HTML captures.logs/cucumber-api.log: backend startup log.logs/cucumber-web.log: frontend startup log
Open the HTML report locally with:
open cucumber-report/report.html
Writing new scenarios
Workflow
- Create a
.featurefile underfeatures/<capability>/ - Add step definitions under
features/step-definitions/<capability>/ - Reuse existing steps from
common/and other definition files before writing new ones - Run with
pnpm -C e2e e2e -- --tags @your-tagto verify - Run
pnpm -C e2e checkbefore committing
Feature file conventions
Tag every feature or scenario with a capability tag. Add auth tags only when they clarify intent or change the browser session behavior:
@datasets @authenticated
Feature: Create dataset
Scenario: Create a new empty dataset
Given I am signed in as the default E2E admin
When I open the datasets page
...
- Capability tags (
@apps,@auth,@datasets, …) group related scenarios for selective runs - Auth/session tags:
- default behavior — scenarios run with the shared authenticated storageState unless marked otherwise
@unauthenticated— uses a clean BrowserContext with no cookies or storage@authenticated— optional intent tag for readability or selective runs; it does not currently change hook behavior on its own
@fresh— only runs ine2e:fullmode (requires uninitialized instance)@skip— excluded from all runs
Keep scenarios short and declarative. Each step should describe what the user does, not how the UI works.
Step definition conventions
import { When, Then } from '@cucumber/cucumber'
import { expect } from '@playwright/test'
import type { DifyWorld } from '../../support/world'
When('I open the datasets page', async function (this: DifyWorld) {
await this.getPage().goto('/datasets')
})
Rules:
- Always type
thisasDifyWorldfor proper context access - Use
async function(not arrow functions — Cucumber bindsthis) - One step = one user-visible action or one assertion
- Keep steps stateless across scenarios; use
DifyWorldproperties for in-scenario state
Locator priority
Follow the Playwright recommended locator strategy, in order of preference:
| Priority | Locator | Example | When to use |
|---|---|---|---|
| 1 | getByRole |
getByRole('button', { name: 'Create' }) |
Default choice — accessible and resilient |
| 2 | getByLabel |
getByLabel('App name') |
Form inputs with visible labels |
| 3 | getByPlaceholder |
getByPlaceholder('Enter name') |
Inputs without visible labels |
| 4 | getByText |
getByText('Welcome') |
Static text content |
| 5 | getByTestId |
getByTestId('workflow-canvas') |
Only when no semantic locator works |
Avoid raw CSS/XPath selectors. They break when the DOM structure changes.
Assertions
Use @playwright/test expect — it auto-waits and retries until the condition is met or the timeout expires:
// URL assertion
await expect(page).toHaveURL(/\/datasets\/[a-f0-9-]+\/documents/)
// Element visibility
await expect(page.getByRole('button', { name: 'Save' })).toBeVisible()
// Element state
await expect(page.getByRole('button', { name: 'Submit' })).toBeEnabled()
// Negation
await expect(page.getByText('Loading')).not.toBeVisible()
Do not use manual waitForTimeout or polling loops. If you need a longer wait for a specific assertion, pass { timeout: 30_000 } to the assertion.
Cucumber expressions
Use Cucumber expression parameter types to extract values from Gherkin steps:
| Type | Pattern | Example step |
|---|---|---|
{string} |
Quoted string | I select the "Workflow" app type |
{int} |
Integer | I should see {int} items |
{float} |
Decimal | the progress is {float} percent |
{word} |
Single word | I click the {word} tab |
Prefer {string} for UI labels, names, and text content — it maps naturally to Gherkin's quoted values.
Scoping locators
When the page has multiple similar elements, scope locators to a container:
When('I fill in the app name in the dialog', async function (this: DifyWorld) {
const dialog = this.getPage().getByRole('dialog')
await dialog.getByPlaceholder('Give your app a name').fill('My App')
})
Failure diagnostics
The After hook automatically captures on failure:
- Full-page screenshot (PNG)
- Page HTML dump
- Console errors and page errors
Artifacts are saved to cucumber-report/artifacts/ and attached to the HTML report. No extra code needed in step definitions.
Reusing existing steps
Before writing a new step definition, inspect the existing step definition files first. Reuse a matching step when the wording and behavior already fit, and only add a new step when the scenario needs a genuinely new user action or assertion. Steps in common/ are designed for broad reuse across all features.
Or browse the step definition files directly:
features/step-definitions/common/— auth guards and navigation assertions shared by all featuresfeatures/step-definitions/<capability>/— domain-specific steps scoped to a single feature area