Appendix D — Source archive index
The canonical index of every external source cited in the book. Each entry renders with title, author, publish date, capture date, trust tier, and (when available) a Perma.cc archival link. The full Playwright capture of each source lives in the repo's gitignored local cache for drift detection; only the metadata below is public.
Every claim in this book that rests on an external source is tagged with a <Citation> that resolves to an entry in this archive. The archive itself lives in version-controlled YAML; this page is the human-readable view. Ch 15’s source-tiering methodology is implemented concretely here — every source carries a tier from T1-official to T4-conjecture, and the audit cadence for each chapter is shaped by the tier mix of its cited sources.
How the archive works
The book ships a structured source manifest at sources/manifest.yaml. Each entry has a stable slug, a URL, a title, an author, a publish date, a capture date, a trust tier, and (optionally) a Perma.cc archival link and a hash of the locally-captured content.
Citations in chapter MDX reference the slug: <Citation src="gwern-sidenote" /> resolves at build time to the metadata below. The <Citation> component renders the trust tier as a badge so readers can calibrate their confidence at the point of use, and exposes the archival link when present.
The full captured HTML + PDF + screenshot of each source is held in the repo’s gitignored sources/cache/<hash>/ directory — used for drift detection (re-fetch periodically, compare hashes, flag significant changes) but never published. The defensive posture this creates (reader sees metadata and archival link, full cached content stays private) keeps the legal surface small while preserving the author’s ability to verify integrity.
The archive
The listing below is rendered live from sources/manifest.yaml at build time by the <SourceArchive /> component. Entries are grouped by tier in descending authority (T1 → T4); within a tier, newest publish dates come first. Empty tiers render an honest placeholder — the archive is intentionally sparse in the early book, and visible gaps are part of the pedagogy.
T1 · Official no entries yet
Vendor-official documentation or release notes. Highest trust for factual claims about the vendor’s own tool.
No sources at this tier yet.
T2 · Release notes no entries yet
Release blog posts, changelogs, conference talks. Trustworthy for intent and availability claims.
No sources at this tier yet.
T3 · Practitioner 2 entries
Respected community writing with a durable argument the author has defended over time.
T4 · Conjecture no entries yet
Blog posts, tweets, or unverified claims. Pointers to investigate, not authorities.
No sources at this tier yet.
What’s coming
The archive is intentionally sparse at this point in the book’s development. Stage 3 of the roadmap expands it substantially:
- Tool-specific citations. Claude Code’s documentation, Gemini CLI’s release notes, Codex CLI’s reference pages — currently referenced only implicitly in the companion appendices, with explicit citations to land as Stage 3 chapters complete their research phase.
- Academic and practitioner sources. Koller-Friedman’s Probabilistic Graphical Models (the pedagogical framework), Tufte’s design books (the layout inheritance), practitioner long-form on AI-assisted development — landing as chapters citing them move out of draft.
- Faceted filtering (post-v1.0). Client-side filters by tier, tool, and publish-date band over the auto-rendered listing above. Live already at the chapter-level tool filter; the faceted source view reuses the same plumbing.
- Drift-detection workflow. A scheduled job (quarterly) re-fetches every T1 source and flags significant hash changes for human review. The manifest grows a
last_verified_atfield tracking when each source was last re-checked against its live URL.
Audit expectations
Per Ch 15’s methodology, each entry in the archive carries an implicit re-verification cadence based on its tier:
- T1-official sources: verify on major vendor releases (triggered by release, not by calendar).
- T2-release-notes sources: verify quarterly.
- T3-practitioner sources: verify annually — the author is unlikely to retract, but arguments do evolve.
- T4-conjecture sources: verify before relying on; otherwise do not cite.
The archive’s integrity is the book’s integrity. A source whose content has shifted beneath a citation silently demotes the chapter that cited it to an unknown state. The drift-detection workflow above is the machinery that keeps the gap small; the human audit cadence is the safety net when the machinery misses something.
Quick reference
- Canonical location:
sources/manifest.yaml; citations by slug via<Citation src="slug" /> - Full captures in
sources/cache/<hash>/(gitignored; drift-detection only) - Four tiers: T1-official, T2-release-notes, T3-practitioner, T4-conjecture
- Audit cadence scales with tier — T1 on release, T2 quarterly, T3 annually, T4 before citing
- Stage 3 replaces this static page with an auto-rendered manifest view
- Volatility: architectural-pattern — the archive’s structure is durable; specific entries change as the book grows.