Data directory
Purpose-built utilities, schemas, and workflows that power our Liquid {% data %} and {% indented_data_reference %} tags, reusable content, UI strings, and feature metadata. This subject focuses on how we read, validate, and serve files in data/ across languages.
Purpose & scope
- Provide a consistent API (
getDataByLanguage,getDeepDataByLanguage) to loaddata/files for Liquid rendering and server contexts. - Enforce schemas for critical data (features, variables, learning tracks, release notes, tables, glossaries, code languages, CTAs).
- Ship CLI and CI helpers that keep
data/clean (orphaned feature detection, deleted-feature PR guardrails). - Exclude: content authoring guidance (see
content/), page routing (seesrc/app/src/frame), and general linter rules (seesrc/content-linter).
Architecture & key assets
lib/get-data.ts: translation-aware loader with memoized reads, forced-English exceptions, and UI data merging; used by Liquid tags and server contexts.lib/data-directory.ts+lib/filename-to-key.ts: generic walker that turns files into dotted-key objects with optional preprocessing.lib/data-schemas/: AJV schema registry that auto-discoversdata/tables/*.ymlschemas and registers other critical shapes (features, variables, learning tracks, release notes, glossaries, code languages, CTAs).- Middleware:
middleware/data-tables.tscaches table data intoreq.context.tables(English). - Scripts:
scripts/find-orphaned-features/*(detect/delete unuseddata/features/*.yml) andscripts/deleted-features-pr-comment.ts(warn on feature deletions in PRs). - Tests:
tests/cover schema validation, data loading, key normalization, and orphan detection fixtures.
Data loading contracts
lib/get-data.tsgetDataByLanguage(dottedPath, langCode): Returns a single value (YAML/MD/variables/reusables/ui/glossaries/release-notes/product-examples).getDeepDataByLanguage(dottedPath, langCode): Returns nested objects for an entire subtree (e.g.,tables,features).- Translation fallbacks: If a localized file is missing or unparsable, falls back to English. Certain files are forced-English (
ALWAYS_ENGLISH_YAML_FILES,ALWAYS_ENGLISH_MD_FILES). - Memoization: Caches reads except in
NODE_ENV=developmentto simplify local debugging.
lib/data-directory.ts- Recursively walks a directory, filters by extensions (
.json,.md/.markdown,.yml) and ignore patterns, and emits a dotted-key object usingfilename-to-key. - Optional
preprocesshook for content transformation (used in tests/prior scripts).
- Recursively walks a directory, filters by extensions (
Schemas and validation
- Schema registry:
lib/data-schemas/index.tsmaps data paths to schema modules; auto-registers anydata/tables/*.ymlthat has a matchingdata-schemas/tables/{name}.ts. - Tests:
src/data-directory/tests/data-schemas.tsloads schemas via AJV and asserts every registered file validates. - Adding a schema:
- Create
src/data-directory/lib/data-schemas/<name>.ts(ortables/<table>.ts). - If non-table, add to
manualSchemasindata-schemas/index.ts; table schemas are auto-detected. - Run tests (see below).
- Create
Middleware
middleware/data-tables.tspopulatesreq.context.tableswithgetDeepDataByLanguage('tables', 'en'). Intended for server/Express contexts where table data is needed without per-request file IO.
Scripts & workflows
npm run find-orphaned-features -- --source-directory data/features --output orphans.json- Scans pages, reusables, variables (all languages) for
{% ifversion %}feature references and reports unuseddata/features/*.yml.
- Scans pages, reusables, variables (all languages) for
npm run find-orphaned-features delete -- orphans.json --max 10- Deletes up to N orphaned feature files (English root) after manual review.
npm run deleted-features-pr-comment -- <owner> <repo> <base_sha> <head_sha>- Generates Markdown warning if a PR removes or renames feature files; used in CI (requires
GITHUB_TOKEN).
- Generates Markdown warning if a PR removes or renames feature files; used in CI (requires
Testing
- All tests:
npm test -- src/data-directory/tests - Targeted:
- Schemas:
npm test -- src/data-directory/tests/data-schemas.ts - Orphans:
npm test -- src/data-directory/tests/orphaned-features.ts - Loader basics:
npm test -- src/data-directory/tests/get-data.ts
- Schemas:
Data conventions and consumers
- File locations: Everything under
data/(English and localized mirrors). Reusables/variables/ui are read via dotted paths (reusables.foo.bar,variables.product.prodname_ghe_server,ui.pages.home). - Markdown in data: Frontmatter is stripped by
gray-matter; content is trimmed. - Downstream consumers:
- Liquid tags:
content-render/liquid/data.ts,indented-data-reference.ts - Content linter:
content-linter/lib/linting-rules/liquid-data-tags.ts,frontmatter-intro-links.ts - Server:
app/lib/app-router-context.ts,app/lib/server-context-utils.ts - Metrics/tests:
content-render/tests,content-linter/tests/site-data-references.ts
- Liquid tags:
- Translation notes:
- Fallbacks ensure missing localized YAML/MD reads from English.
- Specific files are forced-English to avoid corrupt translations (see constants in
get-data.ts).
Setup & usage tips
- Ensure
data/exists relative to project root; schemas auto-scandata/tablesat runtime. - Set
DEBUG_JIT_DATA_READS=trueto log every on-disk read from the data loaders; useful alongside tests or local runs to trace which data files are touched. - When adding a new data directory:
- Prefer YAML for structured data; add schema if shape matters to correctness.
- Add README under
data/<dir>/when introducing new contracts. - Update
manualSchemasif not a table.
Ownership & escalation
- Primary: Docs Engineering.
- Content changes: Docs Content (docs-content).
Current state & next steps
- Current state: KTLO; minimal changes expected. Update this README when touching data loaders, schemas, or scripts.
- Next steps: Keep the schema registry aligned with new data shapes and rerun
npm test -- src/data-directory/testswhen data contracts change.