Skip to content

Java-Idl/git2doc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git2doc

Convert a public GitHub repository into one Markdown document.

What it does

  • Accepts GitHub input as:
    • owner/repo or owner/repo.git (shorthand)
    • https://github.com/owner/repo
    • https://github.com/owner/repo/tree/<branch-or-path-like-branch>
  • Resolves HEAD to the repository default branch.
  • Fetches tree + raw file content from GitHub.
  • Collects text files, with optional inclusion of dotfiles and data/log files.
  • Supports include/exclude path filtering (* and ** patterns).
  • Optionally detects likely binary files using sampled bytes and skips them.
  • Produces one Markdown output with:
    • repository snapshot
    • directory structure
    • combined source blocks
    • optional file-level TOC
    • notes for skipped/failed files

Project layout

  • Frontend: index.html, styles.css, js/app.js
  • Core logic: js/github-api.js, js/repository.js, js/document-builder.js
  • UI rendering/helpers: js/ui.js, js/markdown.js, js/document-utils.js
  • Cloudflare Functions:
    • functions/api.js (/api)
    • functions/config.js (/config)
    • functions/[owner]/[repo].js (path adapter)
  • CLI: cli.mjs (calls hosted API)

Run locally

npx wrangler pages dev . --port 8888

Then open http://localhost:8888.

Browser usage

  1. Enter repository input.
  2. Choose options:
    • include TOC
    • include dotfiles
    • include .csv/.tsv/.log/.env
    • max file size limit (or disable)
  3. Optional advanced settings:
    • allow shorthand on/off
    • strict preview mode
    • detect binary-like files by byte sample
    • include/exclude path patterns (comma/newline)
    • temporary GitHub token
    • GitHub timeout and max retries
  4. Click Fetch & Build Markdown.
  5. Copy or download output.

Status includes stage counters:

  • tree: textFiles/treeItems
  • queue: queued/totalTextFiles (+ skipped large/binary)
  • fetch: completed/total (+ ok/fail)
  • build: completed/total

API usage

Supported routes

curl "https://<your-pages-domain>/api?repo=owner/repo.git" -o repo.md
curl "https://<your-pages-domain>/api/owner/repo.git" -o repo.md
curl "https://<your-pages-domain>/owner/repo" -o repo.md

Query parameters

  • repo=<input> (required unless using path route)
  • branch=<name>
  • includeDotfiles=true|false
  • includeDataFiles=true|false
  • includeToc=true|false
  • maxKb=<number>
  • noMax=true
  • concurrency=<number>
  • includePaths=<pattern1,pattern2,...>
  • excludePaths=<pattern1,pattern2,...>
  • detectBinaryBySample=true|false
  • githubToken=<token>
  • githubTimeoutMs=<number>
  • githubMaxRetries=<number>
  • githubCacheTtlMs=<number>
  • githubCacheSWRMs=<number>

Optional token headers

  • X-GitHub-Token: <token>
  • Authorization: Bearer <token>

Response diagnostics

  • X-Request-Id
  • X-Cache-Status: HIT|STALE|MISS
  • X-Stage-Tree
  • X-Stage-Queue
  • X-Stage-Fetch
  • X-Stage-Build

Error JSON includes requestId and stageCounters.

CLI usage

node cli.mjs <repo-input> [options]

Examples:

node cli.mjs owner/repo --service-url https://<your-pages-domain>/api --out repo.md
GIT2DOC_SERVICE_URL=https://<your-pages-domain>/api node cli.mjs owner/repo.git --out repo.md

Options:

  • --out <file>
  • --service-url <url> (or GIT2DOC_SERVICE_URL)
  • --branch <name>
  • --max-kb <number>
  • --no-max-size
  • --include-dotfiles
  • --include-data-files
  • --include-toc
  • --include-paths <value>
  • --exclude-paths <value>
  • --detect-binary-sample
  • --quiet
  • --help

CLI note: if input ends with .gi, it is auto-corrected to .git.

Environment variables

Server/runtime:

  • GIT2DOC_EXTRA_INPUT_HOSTS
  • GIT2DOC_ALLOW_SHORTHAND
  • GITHUB_TOKEN or GIT2DOC_GITHUB_TOKEN
  • GIT2DOC_GITHUB_TIMEOUT_MS
  • GIT2DOC_GITHUB_MAX_RETRIES
  • GIT2DOC_GITHUB_CACHE_TTL_MS
  • GIT2DOC_GITHUB_CACHE_SWR_MS

CLI:

  • GIT2DOC_SERVICE_URL

Defaults from code:

  • max file size: 250000 bytes (about 250 KB)
  • fetch concurrency: 6
  • GitHub timeout: 12000 ms
  • GitHub max retries: 2
  • cache TTL: 30000 ms
  • cache stale-while-revalidate: 120000 ms

Tests

Run:

node --test tests/git2doc.test.mjs

Current tests cover:

  • GitHub URL/shorthand parsing
  • branch resolution fallback behavior
  • option parsing (size, TOC, data files, dotfiles, binary detection, include/exclude patterns)

Limits

  • Public repositories only.
  • GitHub rate limits still apply (use a token for heavier usage).
  • Very large repositories can still take time due to tree and content fetch volume.

About

github repos to markdown text

Topics

Resources

Stars

Watchers

Forks

Contributors