Convert a public GitHub repository into one Markdown document.
- Accepts GitHub input as:
owner/repoorowner/repo.git(shorthand)https://github.com/owner/repohttps://github.com/owner/repo/tree/<branch-or-path-like-branch>
- Resolves
HEADto the repository default branch. - Fetches tree + raw file content from GitHub.
- Collects text files, with optional inclusion of dotfiles and data/log files.
- Supports include/exclude path filtering (
*and**patterns). - Optionally detects likely binary files using sampled bytes and skips them.
- Produces one Markdown output with:
- repository snapshot
- directory structure
- combined source blocks
- optional file-level TOC
- notes for skipped/failed files
- Frontend:
index.html,styles.css,js/app.js - Core logic:
js/github-api.js,js/repository.js,js/document-builder.js - UI rendering/helpers:
js/ui.js,js/markdown.js,js/document-utils.js - Cloudflare Functions:
functions/api.js(/api)functions/config.js(/config)functions/[owner]/[repo].js(path adapter)
- CLI:
cli.mjs(calls hosted API)
npx wrangler pages dev . --port 8888Then open http://localhost:8888.
- Enter repository input.
- Choose options:
- include TOC
- include dotfiles
- include
.csv/.tsv/.log/.env - max file size limit (or disable)
- Optional advanced settings:
- allow shorthand on/off
- strict preview mode
- detect binary-like files by byte sample
- include/exclude path patterns (comma/newline)
- temporary GitHub token
- GitHub timeout and max retries
- Click Fetch & Build Markdown.
- Copy or download output.
Status includes stage counters:
tree:textFiles/treeItemsqueue:queued/totalTextFiles(+ skipped large/binary)fetch:completed/total(+ ok/fail)build:completed/total
curl "https://<your-pages-domain>/api?repo=owner/repo.git" -o repo.md
curl "https://<your-pages-domain>/api/owner/repo.git" -o repo.md
curl "https://<your-pages-domain>/owner/repo" -o repo.mdrepo=<input>(required unless using path route)branch=<name>includeDotfiles=true|falseincludeDataFiles=true|falseincludeToc=true|falsemaxKb=<number>noMax=trueconcurrency=<number>includePaths=<pattern1,pattern2,...>excludePaths=<pattern1,pattern2,...>detectBinaryBySample=true|falsegithubToken=<token>githubTimeoutMs=<number>githubMaxRetries=<number>githubCacheTtlMs=<number>githubCacheSWRMs=<number>
X-GitHub-Token: <token>Authorization: Bearer <token>
X-Request-IdX-Cache-Status: HIT|STALE|MISSX-Stage-TreeX-Stage-QueueX-Stage-FetchX-Stage-Build
Error JSON includes requestId and stageCounters.
node cli.mjs <repo-input> [options]Examples:
node cli.mjs owner/repo --service-url https://<your-pages-domain>/api --out repo.md
GIT2DOC_SERVICE_URL=https://<your-pages-domain>/api node cli.mjs owner/repo.git --out repo.mdOptions:
--out <file>--service-url <url>(orGIT2DOC_SERVICE_URL)--branch <name>--max-kb <number>--no-max-size--include-dotfiles--include-data-files--include-toc--include-paths <value>--exclude-paths <value>--detect-binary-sample--quiet--help
CLI note: if input ends with .gi, it is auto-corrected to .git.
Server/runtime:
GIT2DOC_EXTRA_INPUT_HOSTSGIT2DOC_ALLOW_SHORTHANDGITHUB_TOKENorGIT2DOC_GITHUB_TOKENGIT2DOC_GITHUB_TIMEOUT_MSGIT2DOC_GITHUB_MAX_RETRIESGIT2DOC_GITHUB_CACHE_TTL_MSGIT2DOC_GITHUB_CACHE_SWR_MS
CLI:
GIT2DOC_SERVICE_URL
Defaults from code:
- max file size:
250000bytes (about 250 KB) - fetch concurrency:
6 - GitHub timeout:
12000ms - GitHub max retries:
2 - cache TTL:
30000ms - cache stale-while-revalidate:
120000ms
Run:
node --test tests/git2doc.test.mjsCurrent tests cover:
- GitHub URL/shorthand parsing
- branch resolution fallback behavior
- option parsing (size, TOC, data files, dotfiles, binary detection, include/exclude patterns)
- Public repositories only.
- GitHub rate limits still apply (use a token for heavier usage).
- Very large repositories can still take time due to tree and content fetch volume.