1
0
mirror of synced 2025-12-19 09:57:42 -05:00

docs: update data-directory README (#58807)

This commit is contained in:
Kevin Heis
2025-12-11 09:59:00 -08:00
committed by GitHub
parent 7e0d2a2be8
commit 95fadcfc4b

View File

@@ -0,0 +1,81 @@
# Data directory
Purpose-built utilities, schemas, and workflows that power our Liquid `{% data %}` and `{% indented_data_reference %}` tags, reusable content, UI strings, and feature metadata. This subject focuses on how we read, validate, and serve files in `data/` across languages.
## Purpose & scope
- Provide a consistent API (`getDataByLanguage`, `getDeepDataByLanguage`) to load `data/` files for Liquid rendering and server contexts.
- Enforce schemas for critical data (features, variables, learning tracks, release notes, tables, glossaries, code languages, CTAs).
- Ship CLI and CI helpers that keep `data/` clean (orphaned feature detection, deleted-feature PR guardrails).
- Exclude: content authoring guidance (see `content/`), page routing (see `src/app`/`src/frame`), and general linter rules (see `src/content-linter`).
## Architecture & key assets
- `lib/get-data.ts`: translation-aware loader with memoized reads, forced-English exceptions, and UI data merging; used by Liquid tags and server contexts.
- `lib/data-directory.ts` + `lib/filename-to-key.ts`: generic walker that turns files into dotted-key objects with optional preprocessing.
- `lib/data-schemas/`: AJV schema registry that auto-discovers `data/tables/*.yml` schemas and registers other critical shapes (features, variables, learning tracks, release notes, glossaries, code languages, CTAs).
- Middleware: `middleware/data-tables.ts` caches table data into `req.context.tables` (English).
- Scripts: `scripts/find-orphaned-features/*` (detect/delete unused `data/features/*.yml`) and `scripts/deleted-features-pr-comment.ts` (warn on feature deletions in PRs).
- Tests: `tests/` cover schema validation, data loading, key normalization, and orphan detection fixtures.
## Data loading contracts
- `lib/get-data.ts`
- `getDataByLanguage(dottedPath, langCode)`: Returns a single value (YAML/MD/variables/reusables/ui/glossaries/release-notes/product-examples).
- `getDeepDataByLanguage(dottedPath, langCode)`: Returns nested objects for an entire subtree (e.g., `tables`, `features`).
- Translation fallbacks: If a localized file is missing or unparsable, falls back to English. Certain files are forced-English (`ALWAYS_ENGLISH_YAML_FILES`, `ALWAYS_ENGLISH_MD_FILES`).
- Memoization: Caches reads except in `NODE_ENV=development` to simplify local debugging.
- `lib/data-directory.ts`
- Recursively walks a directory, filters by extensions (`.json`, `.md/.markdown`, `.yml`) and ignore patterns, and emits a dotted-key object using `filename-to-key`.
- Optional `preprocess` hook for content transformation (used in tests/prior scripts).
## Schemas and validation
- Schema registry: `lib/data-schemas/index.ts` maps data paths to schema modules; auto-registers any `data/tables/*.yml` that has a matching `data-schemas/tables/{name}.ts`.
- Tests: `src/data-directory/tests/data-schemas.ts` loads schemas via AJV and asserts every registered file validates.
- Adding a schema:
1. Create `src/data-directory/lib/data-schemas/<name>.ts` (or `tables/<table>.ts`).
2. If non-table, add to `manualSchemas` in `data-schemas/index.ts`; table schemas are auto-detected.
3. Run tests (see below).
## Middleware
- `middleware/data-tables.ts` populates `req.context.tables` with `getDeepDataByLanguage('tables', 'en')`. Intended for server/Express contexts where table data is needed without per-request file IO.
## Scripts & workflows
- `npm run find-orphaned-features -- --source-directory data/features --output orphans.json`
- Scans pages, reusables, variables (all languages) for `{% ifversion %}` feature references and reports unused `data/features/*.yml`.
- `npm run find-orphaned-features delete -- orphans.json --max 10`
- Deletes up to N orphaned feature files (English root) after manual review.
- `npm run deleted-features-pr-comment -- <owner> <repo> <base_sha> <head_sha>`
- Generates Markdown warning if a PR removes or renames feature files; used in CI (requires `GITHUB_TOKEN`).
## Testing
- All tests: `npm test -- src/data-directory/tests`
- Targeted:
- Schemas: `npm test -- src/data-directory/tests/data-schemas.ts`
- Orphans: `npm test -- src/data-directory/tests/orphaned-features.ts`
- Loader basics: `npm test -- src/data-directory/tests/get-data.ts`
## Data conventions and consumers
- File locations: Everything under `data/` (English and localized mirrors). Reusables/variables/ui are read via dotted paths (`reusables.foo.bar`, `variables.product.prodname_ghe_server`, `ui.pages.home`).
- Markdown in data: Frontmatter is stripped by `gray-matter`; content is trimmed.
- Downstream consumers:
- Liquid tags: `content-render/liquid/data.ts`, `indented-data-reference.ts`
- Content linter: `content-linter/lib/linting-rules/liquid-data-tags.ts`, `frontmatter-intro-links.ts`
- Server: `app/lib/app-router-context.ts`, `app/lib/server-context-utils.ts`
- Metrics/tests: `content-render/tests`, `content-linter/tests/site-data-references.ts`
- Translation notes:
- Fallbacks ensure missing localized YAML/MD reads from English.
- Specific files are forced-English to avoid corrupt translations (see constants in `get-data.ts`).
## Setup & usage tips
- Ensure `data/` exists relative to project root; schemas auto-scan `data/tables` at runtime.
- Set `DEBUG_JIT_DATA_READS=true` to log every on-disk read from the data loaders; useful alongside tests or local runs to trace which data files are touched.
- When adding a new data directory:
- Prefer YAML for structured data; add schema if shape matters to correctness.
- Add README under `data/<dir>/` when introducing new contracts.
- Update `manualSchemas` if not a table.
## Ownership & escalation
- Primary: Docs Engineering.
- Content changes: Docs Content (docs-content).
## Current state & next steps
- Current state: KTLO; minimal changes expected. Update this README when touching data loaders, schemas, or scripts.
- Next steps: Keep the schema registry aligned with new data shapes and rerun `npm test -- src/data-directory/tests` when data contracts change.