diff --git a/src/data-directory/README.md b/src/data-directory/README.md index e69de29bb2..77fc9dd378 100644 --- a/src/data-directory/README.md +++ b/src/data-directory/README.md @@ -0,0 +1,81 @@ +# Data directory + +Purpose-built utilities, schemas, and workflows that power our Liquid `{% data %}` and `{% indented_data_reference %}` tags, reusable content, UI strings, and feature metadata. This subject focuses on how we read, validate, and serve files in `data/` across languages. + +## Purpose & scope +- Provide a consistent API (`getDataByLanguage`, `getDeepDataByLanguage`) to load `data/` files for Liquid rendering and server contexts. +- Enforce schemas for critical data (features, variables, learning tracks, release notes, tables, glossaries, code languages, CTAs). +- Ship CLI and CI helpers that keep `data/` clean (orphaned feature detection, deleted-feature PR guardrails). +- Exclude: content authoring guidance (see `content/`), page routing (see `src/app`/`src/frame`), and general linter rules (see `src/content-linter`). + +## Architecture & key assets +- `lib/get-data.ts`: translation-aware loader with memoized reads, forced-English exceptions, and UI data merging; used by Liquid tags and server contexts. +- `lib/data-directory.ts` + `lib/filename-to-key.ts`: generic walker that turns files into dotted-key objects with optional preprocessing. +- `lib/data-schemas/`: AJV schema registry that auto-discovers `data/tables/*.yml` schemas and registers other critical shapes (features, variables, learning tracks, release notes, glossaries, code languages, CTAs). +- Middleware: `middleware/data-tables.ts` caches table data into `req.context.tables` (English). +- Scripts: `scripts/find-orphaned-features/*` (detect/delete unused `data/features/*.yml`) and `scripts/deleted-features-pr-comment.ts` (warn on feature deletions in PRs). +- Tests: `tests/` cover schema validation, data loading, key normalization, and orphan detection fixtures. + +## Data loading contracts +- `lib/get-data.ts` + - `getDataByLanguage(dottedPath, langCode)`: Returns a single value (YAML/MD/variables/reusables/ui/glossaries/release-notes/product-examples). + - `getDeepDataByLanguage(dottedPath, langCode)`: Returns nested objects for an entire subtree (e.g., `tables`, `features`). + - Translation fallbacks: If a localized file is missing or unparsable, falls back to English. Certain files are forced-English (`ALWAYS_ENGLISH_YAML_FILES`, `ALWAYS_ENGLISH_MD_FILES`). + - Memoization: Caches reads except in `NODE_ENV=development` to simplify local debugging. +- `lib/data-directory.ts` + - Recursively walks a directory, filters by extensions (`.json`, `.md/.markdown`, `.yml`) and ignore patterns, and emits a dotted-key object using `filename-to-key`. + - Optional `preprocess` hook for content transformation (used in tests/prior scripts). + +## Schemas and validation +- Schema registry: `lib/data-schemas/index.ts` maps data paths to schema modules; auto-registers any `data/tables/*.yml` that has a matching `data-schemas/tables/{name}.ts`. +- Tests: `src/data-directory/tests/data-schemas.ts` loads schemas via AJV and asserts every registered file validates. +- Adding a schema: + 1. Create `src/data-directory/lib/data-schemas/.ts` (or `tables/.ts`). + 2. If non-table, add to `manualSchemas` in `data-schemas/index.ts`; table schemas are auto-detected. + 3. Run tests (see below). + +## Middleware +- `middleware/data-tables.ts` populates `req.context.tables` with `getDeepDataByLanguage('tables', 'en')`. Intended for server/Express contexts where table data is needed without per-request file IO. + +## Scripts & workflows +- `npm run find-orphaned-features -- --source-directory data/features --output orphans.json` + - Scans pages, reusables, variables (all languages) for `{% ifversion %}` feature references and reports unused `data/features/*.yml`. +- `npm run find-orphaned-features delete -- orphans.json --max 10` + - Deletes up to N orphaned feature files (English root) after manual review. +- `npm run deleted-features-pr-comment -- ` + - Generates Markdown warning if a PR removes or renames feature files; used in CI (requires `GITHUB_TOKEN`). + +## Testing +- All tests: `npm test -- src/data-directory/tests` +- Targeted: + - Schemas: `npm test -- src/data-directory/tests/data-schemas.ts` + - Orphans: `npm test -- src/data-directory/tests/orphaned-features.ts` + - Loader basics: `npm test -- src/data-directory/tests/get-data.ts` + +## Data conventions and consumers +- File locations: Everything under `data/` (English and localized mirrors). Reusables/variables/ui are read via dotted paths (`reusables.foo.bar`, `variables.product.prodname_ghe_server`, `ui.pages.home`). +- Markdown in data: Frontmatter is stripped by `gray-matter`; content is trimmed. +- Downstream consumers: + - Liquid tags: `content-render/liquid/data.ts`, `indented-data-reference.ts` + - Content linter: `content-linter/lib/linting-rules/liquid-data-tags.ts`, `frontmatter-intro-links.ts` + - Server: `app/lib/app-router-context.ts`, `app/lib/server-context-utils.ts` + - Metrics/tests: `content-render/tests`, `content-linter/tests/site-data-references.ts` +- Translation notes: + - Fallbacks ensure missing localized YAML/MD reads from English. + - Specific files are forced-English to avoid corrupt translations (see constants in `get-data.ts`). + +## Setup & usage tips +- Ensure `data/` exists relative to project root; schemas auto-scan `data/tables` at runtime. +- Set `DEBUG_JIT_DATA_READS=true` to log every on-disk read from the data loaders; useful alongside tests or local runs to trace which data files are touched. +- When adding a new data directory: + - Prefer YAML for structured data; add schema if shape matters to correctness. + - Add README under `data//` when introducing new contracts. + - Update `manualSchemas` if not a table. + +## Ownership & escalation +- Primary: Docs Engineering. +- Content changes: Docs Content (docs-content). + +## Current state & next steps +- Current state: KTLO; minimal changes expected. Update this README when touching data loaders, schemas, or scripts. +- Next steps: Keep the schema registry aligned with new data shapes and rerun `npm test -- src/data-directory/tests` when data contracts change. \ No newline at end of file