microsoft/selective-repo-fetch
TypeScript
Captured source
source ↗microsoft/selective-repo-fetch
Description: Enforce docs-as-code by declaratively separating documentation from code. Match manifest glob patterns against repo file trees to fetch only the files your build needs.
Language: TypeScript
License: MIT
Stars: 2
Forks: 0
Open issues: 0
Created: 2026-05-26T23:02:27Z
Pushed: 2026-05-27T01:36:00Z
Default branch: main
Fork: no
Archived: no
README:
selective-repo-fetch
Docs-as-code made practical.
Declaratively define which files are documentation and which are code. When documentation lives alongside code in large repositories, building a documentation site shouldn't require cloning the entire repo. selective-repo-fetch reads a JSON manifest that declares which files your doc pipeline needs, matches those patterns against a file listing, and tells you exactly what to fetch — nothing more.
The Problem
Docs-as-code means your documentation is:
- ✅ Versioned in git alongside source code
- ✅ Reviewed through pull requests
- ✅ Built by CI/CD pipelines
But large monorepos create real pain:
- Full clones are slow — repos with 100K+ files take minutes to clone
- API throttling is real — GitHub/Azure DevOps/GitLab rate-limit file downloads
- Doc builds only need a fraction — your manifest already declares what files matter
The Solution
selective-repo-fetch sits between your git provider API and your doc build pipeline:
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐ │ Git Provider │ │ selective-repo-fetch │ │ Doc Pipeline │ │ (file listing) │────▶│ (manifest matching │────▶│ (build only │ │ │ │ + reference filter) │ │ matched files)│ └─────────────────┘ └──────────────────────┘ └─────────────────┘
1. Get a file listing from any git API (cheap metadata call) 2. Resolve manifest → get content matches (markdown, configs) and resource matches (images, videos) 3. Fetch the content files (small text files — fast and cheap) 4. Filter resources by reference → scan content for , ``, etc. and keep only resources actually used 5. Fetch only the referenced resources → skip unreferenced large binaries entirely
Installation
npm install github:microsoft/selective-repo-fetch
Quick Start
import { resolveFileMatches, filterReferencedResources } from 'selective-repo-fetch';
// Your manifest declares what your doc site needs
const manifest = {
build: {
content: [{ files: ['**/*.md'], src: 'docs' }],
resource: [{ files: ['**/*.{png,jpg,svg}'], src: 'docs/images' }],
template: ['templates/custom'],
},
};
// Step 1: Get file listing from any git API (cheap metadata call)
const repoFiles = [
{ path: '/docs/getting-started.md' },
{ path: '/docs/api-reference.md' },
{ path: '/docs/images/architecture.png' },
{ path: '/docs/images/unused-screenshot.png' },
{ path: '/src/main.ts' }, // ← not documentation
{ path: '/scripts/deploy.ps1' }, // ← not documentation
];
// Step 2: Resolve manifest patterns → content + resource matches
const result = resolveFileMatches(repoFiles, manifest, '/', '/manifest.json');
console.log(result.contentMatches);
// ['/docs/getting-started.md', '/docs/api-reference.md']
console.log(result.resourceMatches);
// ['/docs/images/architecture.png', '/docs/images/unused-screenshot.png']
// Step 3: Fetch the content files (small text — fast and cheap)
const contentFileTexts = {
'/docs/getting-started.md': '# Getting Started\n',
'/docs/api-reference.md': '# API Reference\nNo images here.',
};
// Step 4: Filter resources to only those actually referenced in content
const referencedResources = filterReferencedResources(result.resourceMatches, contentFileTexts);
console.log(referencedResources);
// ['/docs/images/architecture.png']
// ↑ unused-screenshot.png is dropped — it matched the glob but no content file references it
// Step 5: Fetch only the referenced resources — skip unreferenced large binariesUse Cases
Documentation portals pulling from multiple repos
Your portal builds docs from 50+ repos. Instead of cloning each one, get the tree listing and resolve only the doc files.
AI agent knowledge bases
Selectively ingest documentation from multiple repos into a RAG pipeline — only content files, not code, tests, or CI configs. The manifest-driven separation means your agents always have fresh, accurate documentation without processing entire repositories.
Monorepo doc builds
A 200K-file monorepo where docs live in /docs, /api-docs, and scattered README.md files. The manifest declares exactly which paths matter.
Incremental content pipelines
Combined with a git diff, resolve which *documentation* files changed — not which *code* files changed.
Static site generators (DocFX, MkDocs, Sphinx, Docusaurus)
Any SSG that uses a manifest/config to declare its inputs can benefit from pre-filtering the repo file list.
API
resolveFileMatches(files, manifest, patternPrefix?, manifestPath?)
The core function. Resolves manifest patterns against a file listing.
Parameters:
files:FileEntry[]— array of{ path: string }representing all files in the repo (from any git tree API)manifest: object — the manifest JSON declaring content/resource patternspatternPrefix:string— prefix for relative patterns (usually the manifest folder path)manifestPath:string— path to the manifest file, used to resolve relativesrcpaths
Returns:
{
contentMatches: string[]; // Files needed for content (markdown, notebooks, configs)
resourceMatches: string[]; // Files needed as resources (images, videos, binaries)
}resolveExternalPatterns(manifest, manifestPath?)
Discovers patterns that reference files outside the manifest folder (via src: "../other-folder"). Use this to know which additional tree paths to enumerate before calling resolveFileMatches.
extractStaticPathPrefix(pattern)
Extracts the non-glob prefix from a pattern — useful for converting glob patterns to API-compatible folder paths.
extractStaticPathPrefix('/docs/**/*.md') // → '/docs'
extractStaticPathPrefix('**/*.md') // → '/'resolveResourceFiles(resourcePaths, resourceSections, manifestPath)
Resolves candidate file system paths for resource files relative to a manifest.
filterReferencedResources(resourcePaths, contentFileTexts)
Filters resource paths to only include files that are actually…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Low-star routine repo